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Abstract 

This is the write-up of three lectures on algorithms for dynamical fermions that 
were given at the ILFTN workshop 'Perspectives in Lattice QCD' in Nara during 
November 2005. The first lecture is on the fundamentals of Markov Chain Monte 
Carlo methods and introduces the Hybrid Monte Carlo (HMC) algorithm and 
symplectic integrators; the second lecture covers topics in approximation theory and 
thereby introduces the Rational Hybrid Monte Carlo (RHMC) algorithm and ways of 
evading integrator instabilities by means of multiple pseudofermion fields; the third 
lecture introduces on-shell chiral (Ginsparg-Wilson) lattice fermions and discusses 
five-dimensional formulations for computing fermion propagators for such fermions. 



1 Introduction 



This is a written version of a set of lectures on algorithms for dynamical fermions. The 
organization of these lecture notes is as follows. 

In the first lecture (§2) we introduce some 'building blocks' from which we can construct 
Monte Carlo algorithms for the evaluation of the functional integrals that describe quantum 
field theories on the lattice, and in particular for such computations including the dynamical 
effects of fermions. After introducing the basic idea of Monte Carlo integration and proving 
the Central Limit theorem we introduce Markov chains (§2.3), and prove the basic theorem 
on their convergence (§2.4). We explain how detailed balance and the Metropolis algorithm 
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provide a simple way of constructing Markov steps with some specified fixed point, and then 
how we can construct composite Markov steps that are likely to be ergodic. 

Next we briefly introduce perfect sampling by the method of 'Coupling from the Past' that 
in some cases lets us generate completely uncorrelated samples from the exact fixed point 
distribution of a Markov chain. Following this we consider the effects of autocorrelations, 
and introduce the Hybrid Monte Carlo (HMC) algorithm as a way of reducing them without 
having to resort to approximating the equilibrium distribution. The requirement HMC has 
for reversible and area preserving integrators for Hamiltonian systems leads us to analyze 
symplectic integrators (§2.13) using the Baker-Campbell-Hausdorff (BCH) formula (§2.12). 
The amplification of floating point arithmetic rounding errors by such integrators is consid- 
ered in §2.16, where we also see the effects of the instabilities (§2.13.1) that must occur in 
symplectic integrators when the step size becomes too large. The use of multiple timescale 
integration schemes to avoid these instabilities is discussed in §2.14. Finally we discuss the 
use of pseudofermions within the HMC algorithm to handle dynamical fermions (§2.15). 

We begin the second lecture by introducing the theory of optimal polynomial approximation 
and establishing Chebyshev's criterion. After elucidating the role of Chebyshev polynomials 
(§3.3) we consider Chebyshev optimal rational function approximation in §3.4. A discussion 
of the significance of rational approximations for functions of matrices is given in §3.6. The 
use of multiple pseudofermion fields (§3.8) to reduce the fluctuations in the force exerted by 
the pseudofermions on the gauge fields was introduced by Hasenbusch (§3.8.1), and the way 
in which this may be implemented using RHMC follows. 

The results of numerical comparison of RHMC with the R algorithm are given both for 
finite temperature QCD (§3.11.1) and for domain wall fermions (§3.11.2). Data showing 
the efficacy of using multiple timescale integrators for multiple pseudofermions follows, as 
does data showing how these methods succeed in 'bringing down the Berlin Wall' for Wilson 
fermions. 

The third lecture (§4) is concerned with five-dimensional algorithms for Ginsparg-Wilson 
(GW) fermions. We introduce the concept of chiral lattice fermions in what we believe to 
be a logical (rather than historical) approach, starting with Liischer's on-shell chiral trans- 
formation, deriving the GW identity from it, and then showing that Neuberger's operator is 
essentially the unique solution (up to the choice of kernel, and assuming 75 hermiticity). 

We then turn to a class of five- dimensional algorithms to invert Neuberger's operator: in 
these the Schur complement (§4.5) of a matrix plays a central role. The algorithms are char- 
acterized by four independent choices: the kernel of the Neuberger operator (§4.4); whether 
we introduce five-dimensional pseudofermions as dynamical fields in a Markov process or 
just view them as constraints in the computation of the inverse of the Schur complement 
(§4.6); the choice of rational approximation of the sgn (signum) function (§4.7); and the 
choice of the five dimensional matrix used to linearize the approximation to the Neuberger 
operator (§4.8). The different choices of five dimensional matrices correspond to different 
ways of representing a rational functions; as a continued fraction, a partial fraction, or as 
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a Euclidean Cayley transform. The latter lead directly to the domain wall fermions for- 
mulation and its generalizations (§4.8.3). Finally we consider the characterization of chiral 
symmetry breaking in these approaches, and look at the results of preliminary numerical 
studies. 



2 Building Blocks for Monte Carlo Methods 

We start by reviewing the basic ideas of Monte Carlo integration and Markov processes. [1] 
2.1 Monte Carlo Methods 

The basic idea of Monte Carlo integration is the mathematical identification of probabilities 
with integration measures: we evaluate an integral by sampling the integrand at points 
selected at random from a probability distribution proportional to the integration measure. 

Of course, there are much better methods for carrying out numerical integration (quadrature) 
in low dimensional spaces; however, all these methods become hopelessly expensive for high 
dimensional integrals. Since in lattice quantum field theory there is one integration per 
degree of freedom, the only practical way to carry out such integrations is to use Monte 
Carlo methods. 

The fundamental objective of (Euclidean) quantum field theory is to compute the expectation 
value of some operator £l((f>) that depends on the field 

where the action is S(4>), the measure is d(f), and the partition function Z is introduced to 
impose the normalisation condition (1) = 1. 

In order to calculate this expectation value, we generate a sequence of field configurations 
(0i, 02, • • • , 4>u ■ ■ ■ i 0jv) chosen from the probability distribution 

P(0 t )d0 t = ie- s ^; 

how this may be done will be explained later (§2.3). We then measure the value of the 
operator Q on each configuration, and compute its average over all the configurations 

1 N 

This sample average, which we denote by writing a bar over the quantity averaged, is to be 
contrasted with the expectation value which is denoted by enclosing the quantity in angle 
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brackets. The law of large numbers then tells us that the configuration average Q tends to 
the expectation value (Q) as N, the number of configurations sampled, tends to infinity, 

(SI) = lim 0. 



2.2 Central Limit Theorem 

The Laplace-DeMoivre central limit theorem establishes the stronger result that under very 
general conditions the sample average tends to become Gaussian distributed with the expec- 
tation value (Q) as its mean and with a standard deviation which falls as 1/V7V, 



P(f2) ~ constant x exp 



(n-<n» : 



2C 2 /N 

where C 2 = {(SI - (Q)) J is the variance of the distribution of SI. Note that the central 
limit theorem is an asymptotic expansion in 1/y/N for the probability distribution of 0. 



In order to prove the central limit theorem let us consider the distribution of the configuration 
average: the distribution of the values of a single sample, uj = Sl((fi), is 

PnH = f d(f>P(<f)) 5(lo - n((f))) = (5(u - n(<f>)) 

If we take the logarithm of the Fourier transform of this we obtain the generating function 
for the connected moments (cumulants) of Q, namely 



Wn(k)=\n J dcuPn(u>)e ik " = In J rf0P(0)e' 



kn(<t>) 



TV. 

n=0 

where the first few cumulants 1 are 

,4 



c = o, c 3 = ((n-(n)Y), 

c 1 = (n), c 4 = ((n-(n)Y)-3Cl 

c 2 = ((n-(n}) 2 ), 



Next we consider the distribution of the value of the average of iV samples, 

f 1 N 

P n (co) EE /#!•.. d(j> N P(0x) • • • P^N) ^-JrY. ^t)) , 
J t=l 



1 C\ and C2 are the mean and variance, a = v / C ? 2 is the standard deviation, and C 3 /cr 3 and C4/ <r 4 are 
called the skcwness and kurtosis. 
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and we construct its connected generating function 
W n (k)=\n [ du P^u)e lk[d 



, ., N 

In / # 1 .--#^P(0 1 ).--P(0 iV ) exp(^-^fi(0 t )) 
J t=i 



= ln 



d4>P{<t))e 



ikn{4>)/N 



N 



N\n{e lkn ' N ) 



NW n (k/N) = 



n=l 



(iky c n 



We may take the inverse Fourier transform of Wq to obtain an explicit expression for the 
distribution Pq, 



P^ LJ ) = — [ dke w ^ k) e~ ikQ 
2ix J 



cxp 



cxp 



.71=3 



■l) n C„ / d V 



nliV™- 1 \du J 



- 



I 



3\N 2 \du 



C 4 



4!iV 3 \du 



dk 
2^ 



d 



exp 



ifc(fi) + {ikf^-iku 



exp 



2C 2 /iv 



This is an asymptotic expansion because the cumulant expansion in general only converges 
for sufficiently small values of \k\, whereas the integral is over all values of k. It can be shown 
that this leads to corrections of 0(e~ aN ) for some constant a > that, for any given value 
of N, will exceed the 1/N e term in the series for some i. [2] 

The distribution Pq tends to a 5 function as N — » 00, and in order to see its Gaussian nature 
it is useful to rescale its argument to £ = \HSf (uo — (fi)) , in terms of which 

di 



P n (u) = 



du 



with 



F(0 



1 me - 3c 2 ) 

6C?a/]V 



e -? 2 /2C 2 



Figure 1 illustrates the central limit theorem by showing how the scaled probability distribu- 
tion approaches a Gaussian distribution for the case where a single sample x is chosen 
uniformly in — § < x < §. 



2.3 Markov Chains 



In order to implement the Monte Carlo integration procedure outlined in the previous section 
we need to generate a sequence of configurations distributed with probability proportional to 
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Figure 1: A simple illustration of how the probability distribution of the rescaled sample 
mean approaches a Gaussian. The four curves show the functions Fn{0 of the scaled variable 
£ = xVW for N = 1, . . . , 4. These functions are defined by F N (^)d^ = Pfq{x)dx, where 

the unsealed probability distributions are Pn(x) = / dxx ■ ■ ■ dx^ P\{x\) ■ ■ ■ -Pi (a? at) Six — 



^Ej=i starting with the very simple (and very non-Gaussian) distribution Pi(x) = 



e . The only practicable way that we can do this is by utilising Markov chains, a procedure 
that mathematicians call Markov Chain Monte Carlo (MCMC). 

Let Q be the state space: for example in the present case each point in this space is gauge 
field configuration. We consider an ergodic stochastic mapping P : Q — > Q; by 'stochastic' 
we mean that P(j <— i) gives the probability that state i will be mapped to state j, but that 
the actual state that i is taken to is selected at random with this probability. By 'ergodic' 
we mean that the probability getting from any state to any other state is greater than zero. 
The key to analyzing the properties of a Markov chain is to think of it as a deterministic 
mapping of probability distributions on the state space rather than as a stochastic mapping 
on the state space per se. A probability distribution is a mapping Q : Q — > R which is 
positive and normalized, Q(x) > Vx e Q and J Q dxQ(x) = 1. If we call the space of all 
such mappings 2 Qn then the Markov process P induces a map P : Qn — > Qn- 

2 Strictly speaking we define Qq to be the space of equivalence classes of such mappings, as two probability 




x). 
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2.4 Convergence of Markov Chains 



The principal result in the theory of Markov chains is that whatever distribution of states 
we start with (for example we could start with a very specfic state, such as a totally ordered 
gauge configuration or 'cold start', or we could choose our starting state entirely at random, 
a 'hot start'), the sequence of distributions generated by repeatedly applying the Markov 
mapping P converges to unique fixed point distribution Q. The purpose of this section is to 
establish this result. 

To this end we introduce a metric on the space Qq of probability distributions by defining 
the distance 3 between two probability distributions Q 1 and Q 2 

d(Q 1 ,Q 2 ) = J dx\Q 1 (x)-Q 2 (x)\. (1) 

Our approach to the proof is that the Markov process is a contraction mapping with respect 
to this distance: when we apply P to two probability distributions the distance between 
the resulting distributions is smaller than the distance between the two original probability 
distributions 

d(PQ 1 ,PQ 2 ) < (l-a)d(Q 1: Q 2 ), 

where < a < 1. 

Once we have established this, we may argue that P is a contraction mapping with respect 
to the metric of equation (1), so the sequence (Q, PQ, P 2 Q, P 3 Q, . . .) is Cauchy, namely that 
for any e > there is an integer N such that for all n > m > N we have d(P m Q, P n Q) < e. 
This is just the Banach fixed-point theorem, which is easily proved as follows: 

n— m— 1 

d(P m Q, P n Q) < d(P m+3 Q, P m+j+1 Q) 

3=0 

by repeated application of the triangle inequality, 

n—m—l oo 

< (! - a) j d{P m Q, P m+1 Q) < ^(1 - ayd{P m Q, P m+1 Q) 

3=0 3=0 

= -d(P m Q, P m+1 Q) < K - >—d{Q, PQ) < K - '—d(Q, PQ) < e, 

a a a 

where the final inequality holds provided that 

. . ea 
In 

N> — 



d(Q,PQ) 



ln(l - a) 



distributions are to be considered equal if their difference vanishes almost everywhere: that is, if their 
difference vanishes except on a set of measure zero. 

3 It is readily verified that this satisfies the axioms d(x, x) — 0, d(x, y) — d(y, x), and the triangle inequality 
d(x, y) < d{x, z) + d(z, y) Vx, y, z € Qq; indeed, it just corresponds to the L\ norm. 
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or d(Q,PQ) = 0. Note that the condition d(PQi, PQ 2 ) < d(Qi,Q 2 ) is not strong enough 
to prove this; indeed, this weaker inequality holds without requiring the assumption of 
ergodicity which is in general necessary for the result to hold. As the space of probability 
distributions is complete we can conclude that the sequence converges to a unique fixed point 
Q = lim^ P n Q. 

In order to show that the Markov process is a contraction mapping, we proceed as follows: 



d(PQ u PQ 2 ) = J dx\PQ 1 (x)-PQ 2 (x)\ 



dx J dyP(x <- y)Qi(y) - J dyP(x <- y)Q 2 (y) 
= J dx J dyP{x <- y)AQ(y) 



where we have defined the quantity AQ(y) = Qi(y) — Q 2 (y), 

= j dx J dyP(x^y)AQ(y)[9(AQ(y))+9(-AQ(y)) 



upon inserting the trivial step function identity 6{y) + 0(—y) = 1. Next we make use of the 
identity ||a| — = \a\ + |6| — 2min(|a|, which is readily established by considering the 
cases \a\ > \b\ and \a\ < \b\ separately, 

= j dx j dyP(x^y)\AQ(y)\ 

-2 J dxmm J dy P(x <- y)AQ(y)0(±AQ(y)) 

The first term simplifies if we note that j dxP(x <— y) — 1 (the probability of going from 
state y to somewhere must be unity), so 



d(PQ 1 ,PQ 2 ) 



< 



We now observe that 



= J dy\AQ(y)\-2 J dxmm J dyP(x <- y)AQ(y)6(±AQ(y)) 
J dy\AQ(y)\-2 J dx inf P(x <- y) mi: 



dyAQ(y)9(±AQ(y)) 



hence 



J dyAQ(y)e(AQ(y))+ J dy AQ(y)6(-AQ(y)) 

= J dyAQ(y) = J dyQ l (y) - J dyQ 2 (y) = 1-1 = 0, 

J dyAQ(y)6(±AQ(y)) = \ J dy\AQ{y)\. 
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We thus finally reach the desired result 



d(PQi,PQ 2 )< J dy\AQ(y)\- J dxw£P(x<^y) J dy\AQ(y)\ 
<(l-a)d(Q 1 ,Q 2 ), 
with < a = f dx mi y P(x <— y) < 1. 

Suppose that we can construct an ergodic Markov process P that has some desired distri- 
bution Q as its fixed point. We then start with an arbitrary state ('field configuration') 
and iterate the Markov process until it has converged ('thermalized'). Thereafter, successive 
configurations will be distributed according to Q, but in general they will correlated. 

An important point is that we only need the relative probabilities of states Q(i)/Q(j) to 
construct P: we do not need to know the absolute normalization of Q. Conversely, suppose 
we want to evaluate f n dxw(x) with w(x) > \/x E fl by Monte Carlo using a Markov chain 
to generate suitably distributed samples of x E il. The Markov chain samples x E f2 with 
probability proportional to w(x), but gives us no hint as to what the absolute probability 
is, so we are unable to find the value of the integral. In other words, we cannot use Markov 
chains to compute integrals directly, but only ratios of integrals of the form 

f n dxw(x)f(x) 
J n dxw(x) 

Fortunately this is usually what we want in quantum field theory where we are not interested 
in the value of the partition function per se. 



2.5 Detailed Balance and the Metropolis Algorithm 

We now consider how to construct a Markov process with a specified fixed point 

Q(x) = J dyP(x^y)Q(y). 

A sufficient (but not necessary) condition is to make it satisfy detailed balance 

P(y <- x)Q(x) = P(x <- y)Q(y) : 

we can easily show that this implies the fixed point condition by integrating both sides with 
respect to y. One simple way of implementing detailed balance is the Metropolis algorithm 
where we select a candidate state x E f2 at random 4 and then accept it with probability 

4 It is not necessary to choose the candidate entirely at random; it suffices that the probability of choosing 
x when starting from y is the same as the probability of choosing y when starting from x. 
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or otherwise keep the initial state y as the next state in the Markov chain. We can show 
that this implies detailed balance (and hence has Q as its fixed point) by considering the 
cases Q(x) > Q(y) and Q(x) < Q(y) separately. 5 

The particular form of the acceptance probability of equation (2) is not unique: other choices 
are possible, e.g., 

P( \ = 9M 

[x ^ y) Q(x) + Q( y y 

but they have lower acceptance. 



2.6 Composition of Markov Steps 

The reason why we can construct bespoke Markov processes from a toolkit of methods is 
that we can combine different Markov steps together. Let P\ and P2 be two Markov steps 
that both have the desired fixed point distribution, but are not necessarily ergodic. Then 
the composition of the two steps P2 o Pi is a Markov step that also has the desired fixed 
point, and it may be ergodic. This trivially generalizes to any (fixed) number of steps. For 
the case where Pi is not ergodic but P™ is the terminology weakly and strongly ergodic is 
sometimes used. 

This result justifies 'sweeping' through a lattice performing single site Metropolis updates. 
Each individual single site update has the desired fixed point because it satisfies detailed 
balance; the entire sweep therefore has the desired fixed point, and furthermore is ergodic. 
On the other hand, the entire sweep does not in general satisfy detailed balance; 'undoing' 
the single site updates would correspond to sweeping through the lattice in the reverse order. 
Of course it would satisfy detailed balance if the sites were updated in a random order, but 
this is neither necessary nor desirable (because it puts too much randomness into the system). 



2.7 Coupling from the Past 

It would appear that two of the fundamental limitations of MCMC are that the distribution 
of states generated only converges to the desired fixed point distribution and never exactly 
reaches it, and that successive states are necessarily correlated to some extent. Before we 
turn to methods to alleviate these problems it is interesting to note that there is a way of 
sampling a sequence of completely independent states from the exact fixed point distribution. 

The method of coupling from the past or perfect sampling was introduced by Propp and 
Wilson. [3] Imagine that we have some ergodic Markov chain, where the stochasticity of 
each step is implemented by using a book of random numbers. For example, if the system 
is in state i at step k then we select the n th random number r k G K from our book, where 

5 The case x = y is special as it must 'mop up' all the rejections. 
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< r k < 1, and set the new state at step k + 1 to be j where 





We now ask what state / will the system be in at step if it was in state i at step — iV? 
(The use of negative step numbers is just a notational convenience). This has a well-defined 
answer that depends on the random numbers r_ N , r_ N+ i, . . . , r_i in our book. If N is large 
enough then with probability one / will be independent of i, because there is a positive 
probability at each step that any two states will map to the same state, and thereafter they 
will continue through the same sequence of states since we are using the same sequence of 
random numbers from our book (I like to call this the 'flypaper principle,' once two states 
have coalesced they stay together forever, and ergodicity guarantees that all the states must 
coalesce eventually). 

Coupling from the Past therefore just consists of finding this state /; it will be sampled from 
the exact fixed point distribution because it is the state that the Markov chain would reach 
at step if it were started at step — oo. We can then repeat the entire procedure using a 
different book of random numbers to get a completely independent sample. 

So far the algorithm is entirely impractical, as it requires following the sequence of states 
visited starting with each state i at step — N, and in cases of interest the number of states 
is extremely large. What makes it practicable in some situations is the existence of a partial 
ordering of the states with a largest and a smallest state (this is what mathematicians call 
a lattice) that is preserved by each Markov step. In other words we have an ordering 6 such 
that i >z j =>• i' b j' where the Markov step takes the unprimed states into the corresponding 
primed ones, using the same random number(s) in both cases. The ordering need only be 
a partial ordering, so for any pair of states i and j we can have i >z j, j >z i, both, or 
neither. Nevertheless the ordering is a lattice, so there is an s min and an s max satisifying 
Sma X t: i t: s min Vi . With such a partial ordering we just need to see if the sequence of states 
starting with s min and s max coalesce: if they do then all states must coalesce and end at the 
same state / at step 0; if they do not then we need to increase N and repeat the calculation 
(using the same book of random numbers, of course). 

An interesting non-trivial example where this is practicable is provided by the Ising model, 
where a state consists of an array s of spins each taking values ±1, and the desired fixed point 
distribution is Q(s) oc exp[/3 Yliij) s « s i] where the sum is over all pairs of nearest-neighbour 
spins and (5 > 0. The Markov update step consists of sweeping through the lattice updating 
each spin in turn from a heatbath, 7 specifically the new value of the spin at site % is chosen, 
independently of the old value s,, by taking a random number < r < 1 and setting 



6 An order relation satisfies the axioms x >zy Ay >z x x = y, and x>y/\y>z^>x>z Vx, y, z. 

7 In general a heatbath algorithm is one that directly samples the desired distrution. This is straightforward 
for a single-site update in the Ising model, and with more effort can also be done for single-link updates in 
SU(2) gauge theories. 
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with Z = exp[/3 s j] + exp[— (3 YliUj) s j\- The partial ordering is that s y t if s« > tjVi, so 
for example 101101 y 000101 but 101101 and 011001 are not comparable. s max is the state 
with all spins +1, and s min that with all spins — 1. We just verify that the heatbath update 
preserves this ordering: if the nearest neighbours of the spin s« being updated are the same 
then this is trivial as the new value, s- = is the same for both states. If the some of the 
neighbours differ then s y t =>• s j ^ ^2(ij) A?' anc ^ ^hus ^ ~ — 1 =^ s i = — 1 so s' >: t'. s 

2.8 Autocorrelations 

Successive states in a Markov chain are correlated in general. There are two different ways 
to measure this autocorrection between states: the first is the exponential autocorrelation 
(§2.8.1), which is an intrinsic property of the Markov process itself; and the second is the 
integrated autocorrelation (§2.8.2) for some observable Q, which is more useful insofar as it 
is directly related to the statistical error of its measured estimator Cl. 

2.8.1 Exponential Autocorrelation 

In §2.3 we proved that an ergodic Markov process converges to a unique fixed point. In terms 
of the transition matrix = P(i <— j) this corresponds to P having a unique eigenvector 
with eigenvalue one and all its other eigenvalues lying strictly within the unit circle in the 
complex plane. In particular, the magnitude of the largest subleading eigenvalue must be 
smaller than l, 9 |A max | < 1. Any initial state vector may be expanded in a basis of normalized 
eigenvectors of P, v — with Pui = \ui and Ai = 1 > A 2 = A max > A 3 > • • •, 



so the leading deviation from the equilibrium state represented by U\ is of magnitude |A^ ax | = 
e Jvin|A max | _ e -N/N cxp ^ which f a vj s G ff exponentially with the number of Markov steps with a 

characteristic scale or exponential autocorrelation time of 



2.8.2 Integrated Autocorrelation 

Consider the autocorrelation of some operator Q measured on a sequence of successive con- 
figurations from a Markov chain. Without loss of generality we may assume (Q) = 0. The 

8 Of course s< = -1 & t\ = -1. 

9 Indeed, from the proof of §2.3, |A max | < 1 — a with a = min^ P^. 
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variance of the estimator (sample average) Q is 



1 



N N 



*=i f=i 

AT AT-1 N 

t=l t=l t'=t+l 

and if we introduced the autocorrelation function (which is independent of i) 

<fi(0 w )fi(0 t )> 



cm = 



this becomes 



N-l 



<^>=i{(^>+-|j:(jv-Qc fiM <^>}. 

The autocorrelation function must fall faster than the exponential autocorrelation 

\C n (l)\<\L, = e~ e/N -, 
so, for a sufficiently large number of samples N 3> iV cxp , 



oo 

(Q 2 ) = {l + 2j2Cn(i)} 



TV 



(1 + 2A n ) 



ft 2 



TV 



1 + 



rN 

1 + (ir 



TV 



where we have defined the integrated autocorrelation function 

oo 

A a = ^CM- 



1=1 



This result tells us that on average 1 + 2An correlated measurements are needed to reduce 
the variance of by the same amount as a single truly independent measurement. 



2.9 Hybrid Monte Carlo 

In order to carry out Monte Carlo computations that include fermion dynamics we would 
like to find an algorithm which has following features: 



• it updates the fields globally, because single link updates are not cheap if the action is 
not local; 
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• it takes large steps through configuration space, because small-step methods carry 
out a random walk which leads to critical slowing down with a dynamical critical 
exponent 10 z = 2. 

• it does not introduce any systematic errors. 

A useful class of algorithms with these properties is the (Generalized) Hybrid Monte Carlo 
(HMC) method. [4] In the HMC algorithm, we introduce a 'fictitious' momentum p for each 
dynamical degree of freedom q, and we construct a Markov chain with fixed point e~ H( - q ' p ^ 
where H is the fictitious Hamiltonian \p 2 +S{q): here the action S of the underlying quantum 
field theory plays the role of the potential in the fictitious classical mechanical system. This 
generates the desired distribution e~ s ^ if we ignore the momenta p (statisticians call the 
distribution of q ignoring p a marginal distribution) . 

The HMC Markov chain alternates two Markov steps: the first step is Molecular Dynamics 
Monte Carlo (MDMC) (§2.10), and the second is (Partial) Momentum Refreshment (§2.11). 
Both have the desired fixed point, and their composite is clearly 11 ergodic. 



2.10 MDMC 

If we could integrate Hamilton's equations exactly we would follow a trajectory (q,p) — > 
(q',p') of constant fictitious energy, H(q,p) = H(q',p'), for fictitious time r: this corresponds 
to traversing a set of equiprobable fictitious phase space configurations. Liouville's theorem 
tells us that this trajectory preserves the measure 12 dq A dp = dq' A dp', and reversing the 
momenta at the end of the trajectory ensures that it is reversible, (q',—p') — > (q,—p), so 
such an update satisfies detailed balance and therefore would provide the desired Markov 
step. 

Of course in general we cannot integrate Hamilton's equations exactly, but if we can find an 
approximate integration scheme that is exactly reversible and area-preserving (q.v., §2.13) 
then it may be used to suggest configurations to a Metropolis test with acceptance probability 
min(l, e~ 5H ), where SH = H(q',p') — H(q,p) is the amount by which our integrator fails to 
conserve fictitious energy. This too gives a Markov step that satisfies detailed balance. 

We therefore build the MDMC Markov step out of three parts: 

1. Molecular Dynamics (MD), an approximate integrator U(r) : (q,p) i— > (q',p') that is 

10 z relates an autocorrelation to a correlation length of the system, An oc £ 2 . The correlation length £ is a 
characteristic length-scale of the physical system that diverges as the system approaches a continuous phase 
transition such as the continuum limit in the case of lattice quantum field theory 

n But it is not clear that this can be proved rigorously for any but the simplest systems. 

12 This is required for detailed balance to be satisfied for a continuous system. 
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exactly area-preserving 



det [/* = det 



'd(q',p'Y 
. d(q,p) _ 



and reversible 



F o U(t) o F o C/( t ) = 1; 



2. a momentum reveral F : p i— > — 

3. a Metropolis test. 

The composition of these, implementing the Metropolis test using a uniform random number 
< r < 1, gives 



2.11 Partial Momentum Refreshment 

The MDMC steps enables us to find acceptable candidate points in phase space far from 
the starting place, but it is far from ergodic because it almost stays on a constant fictitious 
energy surface. We remedy this by alternating it with a momentum refreshement which 
updates the momenta p from a Gaussian heatbath without touching the q: while this is also 
manifestly not ergodic it can easily make large changes in the fictitious energy. 

Partial momentum refreshment is a minor generalization of this: it mixes the old Gaussian 
distributed momenta p with Gaussian noise £, 



The Gaussian distribution of p is invariant under F. The extra momentum reversal F ensures 
that for small 9 the momenta are reversed after a rejection rather than after an acceptance. 
For 9 = 7i / 2, which is what is chosen for the standard HMC algorithm, all momentum 
reversals are irrelevant. 

The reason for introducing partial momentum refreshment in GHMC (also known as second- 
order Langevin or Kramer's algorithm) was that for small 9 and r it introduces a small 
amount of noise frequently into the classical dynamics, rather than introducing a lot of noise 
occasionally as in HMC, where 9 = it/ 2 and r « £. Unfortunately any benefits this may have 
are negated by the fact that the momentum has to reversed after each Metropolis rejection, 
leading to 'zitterbewegung' back and forth along the trajectory. 



CD " (/) = [f ° u{t) 9{e ~ SH - r ^ +i9 ^- e "")] @ 




cos 9 sin 9 
— sin 9 cos 9 
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2.12 Baker-Campbell-Hausdorff (BCH) formula 



Fortunately there is a large class of reversible and area-preserving integrators: symmetric 
symplectic integrators (§2.13). A useful tool to analyze these is the BCH formula: if A 
and B belong to an associative algebra then e A e B = e A+B+s , where 5 is constructed from 
commutators of A and B, i.e., is in the Free Lie Algebra generated by the set {A, B}. More 
precisely, \ia(e A e B ) = J2 n>1 c n , where c\ = A + B, and 

c «+i = — — rl - \ ad Cn (A - B) 
n + I { 



m=0 fel,...,fc2m>l 
felH hfc2m=" 



where adx F = [X, F] , and the B n are Bernoulli numbers. Explicitly, the first few terms are 
ln(e A e B ) = {A + B) + \[A, B] + ^{ [A [A fi]] - [B, [A, 5]] } 

[A [A 5]]] 

+^{ -4 [5, [A, [A [A 5]]]] -6 [[A [A [A 5]]] 

+ 4[B,[B, [A, [A, B]}}}-2 [[A, B],[B, [A, B]] } 

- [A, [A, [A, [A,B]]]] + [B,[B,[B,[A,B]}}}} + ..-; 

from this we easily obtain the formula for a symmetric product 

\n(ei A e B e- A ) = {A + B) - ^{2 [B, [A, B}} + [A, [A, B}} ) 

+^{ 32 [B, [B, [A, [A,B]]]] - 16 [[A,B], [B,[A,B]]] 

+ 28 [B, [A, [A, [A,B]]]] + 12 [[A,B], [A,[A,B]]] (3) 
+ 8[B,[B,[B,[A,B]]}}+ 7[A,[A,[A,[A,B]}}]} + .- - . 

2.13 Symplectic Integrators 

We are interested in finding the classical trajectory in phase space of a system described by 
the Hamiltonian 

H(q,p) = T(p) + S(q) = h? + S(q). 
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The basic idea of symplectic integrator is to write the time evolution operator as 



exp 



( T |) 



:exp l r 



= exp r 



= exp r 



dp d dq d 
dt dp dt dq 
dH d dH d 



+ 



dq dp dp dq 



which relates a time derivative on the left to a linear differential operator on phase space, 
h, on the the right. In differential geometry such an operator is a vector field, and in the 
particular case where it is derived from a Hamiltonian function, 

dH d dH d 

h = l(H) 



i lf> dq dq dp' 
it is called a Hamiltonian vector field. Let us define 

Q = b(T) = T'(p)|-, P = l(S) = S'(q)-^, 

so that h = P + Q. Since the kinetic energy T is a function only of p and the potential 
energy S is a function only of q, it follows that the action of e rP and e rQ may be evaluated 
trivially: by Taylor's theorem 

e TQ :f(q,p)^f(q + TT'(p),p), 
e rP ■ f(q, p)>-> f{q,P~ rS'(q)) . 

It also means that these maps are area-preserving, 

d{e rQ (q,p)) 



d{q,p) 



— (r?>)-i. 



S(<1, p) \-rS"(.q)l 

From the BCH formula (3) we find that the PQP symmetric symplectic integrator is given 
by 



t/St 

t/St 



= (exp{ (P + Q)5r - £ ( [P, [P, Q}} + 2 [Q, [P, Q]] ) 5r 3 + 0(5r 5 ) }) 
= exp{r ((P + Q) - i ( [P, [P, Q]] + 2 [Q, [P, Q]])5r 2 + 0((5r 4 )) } 

= e rh' = e T(P+Q) + 0{6T 2y 
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Such symmetric symplectic integrators are manifestly area-preserving and reversible in addi- 
tion to conserving energy to 0(St 2 ). What is even more remarkable is that since the vector 
field hi is built out of commutators of Hamiltonian vector fields it is itself a Hamiltonian 
vector field: 

h' = Lm = —--—-. 

dp dq dq dp' 

This means that for each symplectic integrator there exists a Hamiltonian H' that is exactly 
conserved. 

In fact we can calculate H' quite easily. If hi = l(Hi) and h 2 = t(H 2 ) are two Hamiltonian 
vector fields derived from Hamiltonians Hi and H 2 respectively then their commutator h 3 = 
[hi, h 2 ] is a Hamiltonian vector field derived from the Hamiltonian that is the Poisson bracket 
of the Hamiltonians, 

h 3 = [hi,h 2 ] = L (H 3 ) where H 3 = {Hi, H 2 } = - 

Poisson brackets satisfy the Jacobi relation {A, {B, C}} + {B, {C, A}} + {C, {A, B}} = 0, 
so they endow the space of Hamiltonian functions with a Lie algebra structure. Indeed, 
the exactly conserved Hamiltonian for our PQP integrator may be written using the BCH 
formula (3) with the substitution of Poisson brackets for the corresponding commutators, 
[Q, P] = [l(T), l(S)\ i-> {T, S}. With this technology it is easy to see that 

H' = H+±(2p 2 S' - S ,2 )5t 2 

+ ^(-p 4 S^ + 6p 2 (S'S"' + 2S" 2 ) - 3S ,2 S")5t 4 + 0(5r 6 ). 

This expansion converges for small values of 5r, and presumably up to the value of 5r for 
which the integrator becomes unstable (q.v., §2.13.1). 

Note that H' cannot be written as the sum of a p-dependent kinetic term and a g-dependent 
potential term, so we cannot make use of this to find an integrator that exactly conserves H. 
Moreover, as H' is conserved, SH = H(q',p') — H(q,p) = [H(q',p') — H'(q',p')} — [H(q,p) — 
H'(q,p)] is of 0(5t 2 ) for trajectories of arbitrary length even if r = 0(5r~ k ) with k > 1. 



2.13.1 Integrator Instability 

What happens if we take the integration step size St to be large? Clearly the Metropolis 
acceptance rate will fall as St increases, but this behaviour undergoes a sudden change at 
some value of St where the integrator goes unstable. We can see this for our PQP integrator 
even for the simple case where S = \q 2 ; [5] in this case the update U (St) is a linear mapping 

A _> (A - ( 1 - ^ r2 dt \ (* 

PJ V) {St + Hr 3 1 - 1ST 2 ) [p 

The determinant of this matrix is unity because of area-preservation, and its characteristic 
polynomial has discriminant St 2 (St 2 — 4). When St = 1 the system changes from having two 
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complex conjugate eigenvalues of magnitude one, e ± ^(0 G l), to having two real eigenvalues, 
e ±u [y 6 1); for 5r > 2 this means that the errors increase exponentially with characteristic 
Liapunov exponent v. 



2.14 Multiple Timescales 



We are not restricted to using simple symmetric symplectic integrators such as those de- 
scribed so far. [6] Suppose that the Hamiltonian is split into pieces 



H(q,p)=T(p) + S 1 (q)+S 2 (q), 



then we can define 



Q = l(T) = P % = L {Si) = S'M^ 

so that h = Pi + P 2 + Q- We may introduce a symmetric symplectic integrator of the form 

JtP 1 



exp (gp 2 ) exp {^Q 



exp (^q) exp (gp 2 



n-v t/8t 



We have a lot of freedom: all we need do is assemble the pieces symmetrically and ensure the 
the leading term in the BCH expansion is H. The remaining freedom can be used to reduce 
or eliminate higher-order errors in 5H, or to make the step size small for a particular force 
term so as to avoid instabilities. For instance, if 2ri|| Pi || oo H-P2II00) then the instability 
in the integrator is tickled equally by each sub-step. This helps if the most expensive force 
computation does not correspond to the largest force. 



2.15 Dynamical Fermions 

The direct simulation of Grassmann fields is not feasible: the problem is not that of manip- 
ulating anticommuting Grassmann variables in a computer, but that e~ Sp = e~^ M ^ is not 
positive definite and this leads to poor importance sampling and thus a huge variance in 
measured quantities. 

We therefore integrate out the quadratic fermion fields to obtain the fermion determinant 

J di> # oc det M. 

The overall sign of the exponent is unimportant. 
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Any operator ip, tp) can be expressed solely in terms of the bosonic field by Schwinger's 
technique of adding source terms ipr] + fjip to S F before integrating over the fermion fields: 



■qM(4>) 1 rj 



r)=7]=0 



e.g., the fermion propagator is 

G(x,y) = (i/,(x)j(y)) = M- 1 (x,y). 



One obvious way of proceeding would be to include the determinant as part of the observable 
to be measured and computing a ratio of functional integrals 



(det M(<PM<f>)) Sb 
(det Mio)) 



with S B (4>) being the bosonic (gauge field) part of the action; but this is not feasible because 
the determinant is extensive in the lattice volume, and we get hopelessly poor importance 
sampling. 

We therefore proceed by representing the fermion determinant as a bosonic Gaussian integral 
with a non-local kernel 



d*M*)«/<*fe«P[-X*-'(*)x]. 



The fermion kernel must be positive definite (all its eigenvalues must have positive real 
parts) as otherwise the bosonic integral will not converge. The new bosonic fields are called 

pseudofermions. 

It is usually convenient to introduce two flavours of fermion and to write 

(det M((j))) 2 = det (M{<p)M ] {<p)) oc f d x d X exp [-x{M ] M)~ 1 x] ■ 



This not only guarantees positivity, but also allows us to generate the pseudofermions from 
a global heatbath by applying to a random Gaussian distributed field. 

The equations for motion for the boson (gauge) fields are 

= 7T, 

. _ _dS* _ ^djM^M)- 1 

71 ~ d(/) X d(j) X 

^^ + [i M<M)-^ a -^l[(M^ x ]. (4) 

The evaluation of the pseudofermion action and the corresponding force then requires us to 
find the solution of a (large) set of linear equations A4^A4ip = x- 
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It is not necessary to carry out the inversions required for the equations of motion exactly, 
there is a trade-off between the cost of computing the force and the acceptance rate of the 
Metropolis MDMC step. The inversions required to compute the pseudofermion action for 
the Metropolis accept/reject step do need to be computed exactly however. We usually take 
'exactly' to by synonymous with 'to machine precision'. 

2.16 Reversibility 

We now want to address the question as to whether HMC trajectories are reversible and area- 
preserving in practice. [5] The only source of irreversibility is the rounding errors caused by 
finite precision floating point arithmetic, so the fundamental reason for irreversibility is just 
that floating point arithmetic is not associative. What we are really studying is how much 
the MD evolution amplifies this noise. 

For fermionic systems we can also introduce irreversibility by choosing the starting vector for 
the iterative linear equation solver time-asymmetrically, as we do if we use a chronological 
inverter, which takes some extrapolation of the previous solutions as the starting vector. 
Of course we do not have to use chronological inverter, we can start with a zero vector; 
alternatively we can find the solution sufficiently accurately that is does not depend on the 
initial guess. 

A way that we may study the irreversibility is to follow a trajectory for time r, then reverse 
the momenta and follow it back again; in other words we compute U 0F0U oF, which in exact 
arithmetic should take us back exactly to where we started. We then measure the distance 
A in phase space from our starting point, using some suitable norm. What we observe is 
that the rounding errors are amplified exponentially with the trajectory length, A oc r u , we 
call the exponent v the Liapunov exponent. In practice if we work with parameters such 
that the Liapunov exponent is small then the resulting irreversibility is not a big problem, 
because the 'seed' rounding errors are of order 10~ 7 for 32-bit floating point arithmetic and 
1CT 15 for 64-bit precision: rounding errors fall exponentially with the number of bits we use 
to represent floating point numbers. 

In Figure 2 we show data for pure SU(3) gauge theory and full QCD (both on tiny lattices) 
where the Liapunov exponent is plotted as a function of the integration step size 5r. 

The fact that for small step size the Liapunov exponent v ^ appears to tell us that 
the underlying continuous-time equations of motion for gauge field fictitious dynamics are 
chaotic. In QCD the Liapunov exponent appears to scale with j3 as the system approaches 
the continuum limit, that is lim^oo z/£ ps constant where £ is the correlation length as 
before. This can be interpreted as saying that the Liapunov exponent characterizes the 
chaotic nature of the continuum classical equations of motion, and is not a lattice artefact. 
If this is so we should not have to worry about reversibility breaking down as we approach 
the continuum limit, as the amplification factor for a trajectory of length r £ stays fixed. 
However, beware that this is based on data from tiny lattices, and is not at all conclusive. 
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Figure 2: The Liapunov exponent v is shown as a function of the integration step size St. The 
top graph is for pure SU (3) gauge theory, the middle one is for QCD with heavy dynamical 
Wilson quarks, and the bottom one is for QCD with light dynamical Wilson quarks. Note 
that the scale for the light quark case is quite different from the other two. The error bars 
show the standard deviation of measurements made on three independent configurations. 
All the data is from 4 4 lattices. 



More significantly, at some particular value of the step size, 5t ~ 0.6 for the top two graphs 
and 5t w 0.1 for the bottom one, the Liapunov exponent start to increase, perhaps linearly. 
This behaviour is what we expect when the integrator has become unstable. For the top two 
graphs this is not very interesting, as the change in energy along a trajectory of length r ~ 1 
at this step size is very large, 5H 3> 1, so the acceptance rate is essentially zero anyhow. 
For the bottom graph — the case with light fermions — it is the instability rather than the 
'bulk' behaviour of 5H that limits the step size we can get away with. We shall investigate 
on way to circumvent this problem in §3.8. 
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3 The RHMC Algorithm 



In this second lecture, we will discuss the Rational Hybrid Monte Carlo (RHMC) algorithm. 
[7] For this purpose we shall first take a brief look at approximation theory. 

3.1 Polynomial Approximation 

We start by consider the problem of approximating a continuous function by a polynomial. 
It turns out that the rational approximation theory is simple generalization of this. 

We want to address the question of what is the best polynomial approximation p(x) to a 
continuous function f(x) over the unit interval [0, 1]. To address the question we have to 
specify an appropriate norm on the space of functions. The most important class of norms 
are the L n norms, defined by 



which satisfies the required axioms 13 provided n > 1. The case n — 2, for example, is usual 
Euclidean norm. The case n — 1 is the L\ norm which was used in the proof of convergence 
of Markov processes in §2.4. The minimax norm is defined by taking the limit n — > oo 



because when n — > oo the integral is dominated by the point at which the error p(x) —f(x) is 
maximal. What we want to do in the following is to find optimal polynomial approximations 
with respect to the norm, since this guarantees a bound on the pointwise error. We can 
generalize these definitions by including a positive weight function, but we will not consider 
this further in this introduction. 

The fundamental theorem on the approximation of continuous functions by polynomials 
was given by Weierstrass, who proved that any continuous function can be arbitrarily well 
approximated over the unit interval by a polynomial. This result is very important in 
functional analysis, where it is the essential ingredient in many proofs, although it is of 
less significance in finding approximations for practical use. The most elegant proof of 
Weierstrass' theorem was given by Bernstein, who showed that the Bernstein polynomials 



can arbitrarily well approximate / over the unit interval by taking n to be sufficiently large, 
that is lim^oo \\p n - fW^ = 0. 

13 These are that ||/|| > 0, + < ji/f+IMI. Il«/ll = \ a \ 11/11 v />0> where a is a scalar : and 11/11 = iff 





(5) 




/ = almost everywhere. 
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3.2 Chebyshev's theorem 



If we restrict ourselves to considering polynomials of some fixed degree then the Bernstein 
polynomial has no reason to be the best approximation to /. What we really want is a 
solution to the 'minimax' problem: that is to find a polynomial p of fixed degree which gives 
the minimum error 

A = min||p- /Hoc = min max \p(x) - f(x)\. 

p p 0<a;<l 

A surprisingly elegant solution to the problem of finding the best approximation of fixed 
degree was given by Chebyshev, who proved that for any degree d there is always a unique 
polynomial p that minimizes the L\ norm (5), and it is characterized by the criterion that 
the error p(x) — f(x) attains its maximum absolute value at exactly d + 2 points on the unit 
interval, and the sign of the error alternates between successive maxima. 

We shall prove Chebyshev's theorem in two steps; we will first establish the necessity of the 
criterion stated above, and then prove its sufficiency 




Figure 3: An illustration of the proof of necessity of Chebyshev's condition, p is a polynomial 
of degree d = 7 that approximates a continuous function /. The heavy line shows the error 
e — p — f that attains its maximum value A = ||p — /||oo at only d + 1 = 8 points (including 
one end point of the interval). The next largest maximum of \e(x)\ has a value of A', as 
shown. We therefore construct the polynomial q that passes through a zero of e between 
each of the points where e(x) = ±A; there are d = 7 such points, so q is also a polynomial 
of degree d = 7. It is chosen to have the opposite sign to e at each of the points where 
\e(x)\ = A, and a maximum value of |(A — A') over the interval. It is shown as the dotted 
line on the graph. The polynomial p + q is then a better approximation than p, as shown by 
its error p + q — f = e + q which is the light gray line. 

Suppose that p is an optimal polynomial of degree d for which the error e(x) = p(x) — f(x) has 
less than d+2 alternating extrema of equal magnitude; then at most d+1 of its maxima can 
exceed some magnitude A'. This defines a 'gap', A — A'. As e is continuous it must have at 
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least one zero between two successive maxima; let us suppose these occur at Xi (i — 1, . . . , d). 
We can construct a polynomial q of degree d which has the opposite sign to e at each of 
the d+ 1 maxima, q(x) = kYYi=i(x — Xi), and whose magnitude is smaller than the 'gap' 
by a suitable choice of the magnitude and sign of the constant k. But this contradicts the 
assumption, for the polynomial p + q is a better approximation than p to /. 




Figure 4: An illustration of the sufficiency of Chebyshev's criterion. The polynomial p of 
degree d = 8 satisfies Chebyshev's criterion, as indicated by the graph of its error e = p — f 
(light gray line) which has d + 2 = 10 alternating extrema of equal magnitude on the 
interval (including the end points), p' is another polynomial also of degree d = 8 with 
a smaller error, He'Hoo < ||e||oo, as shown on the graph by the solid line. The difference 
p' — p = (p' — f) — (p — f) = e' — e shown by the dotted line then must have d + 1 = 9 zeros on 
the interval since e(x) — e'(x) has an alternating sign at each of the extrema of e (as shown 
by the arrows), because otherwise the maximum error of |e'| would not be smaller than that 
of e. Since p — p' is a polynomial of degree d = 8 this means it must vanish identically by 
the fundamental theorem of algebra. 

Sufficiency is shown as following. Suppose that p satisfies Chebyshev's criterion, and there is 
a better polynomial approximation p' than p, i.e., \\p'— f\\oo < \\p— f\\oo] then \p\xi)— f(xi)\ < 
\p(xi) — f(xi)\ at each of the d + 2 extrema. But then p' — p must have d + 1 zeros on the 
unit interval, thus p' — p = as it is a polynomial of degree d. 



3.3 Chebyshev Polynomials 

The role of Chebyshev polynomials in approximation theory causes some confusion. The 
expansion of a function / in Chebyshev polynomials up to degree d does not, in general, give 
the best approximation of that degree. What the Chebyshev polynomials do give us it the 
best approximation of degree d— 1 to the function x d over the compact interval [—1, 1]; this 
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is given by 

p d ^ x )^ x ^-^Y-i Td (x), (6) 
where the Chebyshev polynomials are defined by 

Td{x) = cos(rfcos _1 (a;)). 

The use of the letter T is from an old transliteration of Chebyshev's name. Perhaps the most 
surprising thing about Chebyshev polynomials is that they are polynomials, despite being 
defined in terms of transcendental functions. This is shown by expanding 

cos(dC) = Re e id ^ = Re(cos f + i sin f ) d = Re (^\ (cos f ) d_J '(* sin j 

j'=o 

L^/2J / ,x 

= E( 2 J( cos O d - 2fe (-i) fe [i-(coso 2 ] fe , 

fc=0 ^ ' 



from which we immediately see the polynomial nature of Td(x): indeed writing x = cos£ we 
may proceed to obtain an explicit form 



M/2J k , , X / , X 

i>— n c— n V / V / 



fc=0 ^=0 

From this we see that the leading monomial is 



fc=0 x 7 j=0 w ^j=0 XJ/ j=0 

= {(1 + l) d + (1 - l) d }^- = 2 d ~ 1 x d . 
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The error in the approximation of equation (6) is 

which obviously satisfies Chebyshev's criterion because cos<i£ takes the values ±1 exactly 
d + 1 times on the interval < £ < n. 

In this case the error falls exponentially with the degree d because the coefficient of the 
leading term in the Chebyshev polynomials grows as 2 d_1 , but in general such exponential 
convergence does not occur. 



3.4 Chebyshev Optimal Rational Approximation 

Chebyshev's theorem is easily extended to rational functions (ratios of polynomials); the 
criterion that a rational approximation of degree (n, d) is optimal is that its error must 
attain its maximum magnitude at n + d + 2 alternating points on the interval. 
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Rational functions with nearly equal degree numerator and denominator are usually the 
best choice. An approximation with n < d cannot be better than one with n = d unless 
Chebyshev's criterion would lead to the leading d — n coefficients of the latter vanishing; this 
usually only happens when there is an obvious reason for it, such as trying to approximate a 
rational function by one of higher degree, or trying to approximate an odd function in which 
case the optimal approximation will generally be of degree (d ± l,d). 

An empirical observation is that convergence is often more often exponential for optimal 
rational approximations than for polynomial ones, and that for a given degree rational func- 
tions usually give a much better approximation, which probably is not surprising. 

There are only a few special cases where it is possible to compute optimal polynomial or 
rational approximations in closed form: we have seen one in §3.3 and we will come across 
another in §4.7.2. In general we have to find the optimal approximations numerically. A 
simple (but somewhat slow) numerical algorithm for finding the optimal Chebyshev ratio- 
nal approximation was given by Remez. The Remez algorithm is basically just an way of 
implemeting Chebyshev's proof; it alternates (i) a phase of searching for an alternating set 
of n + d + 2 points at which the global maximum of the error occurs, and (ii) a phase of 
adjusting the n + d + 1 coefficients of the rational function to make the magnitude of the 
error take the same value on the set of points found in the previous phase. A little though 
will explain why it is difficult to find the optimal approximation to a function like sin K 

A realistic example of a rational approximation to the function 1 / ^fx is 

1 ^ (z+2.3475661045)(z+0.1048344600)(>+0.0073063814) 
^- ~ - 3904603901 ( :r+ o.4105999719)(a;+0.0286165446)(x+0.0012779193) " 

This is accurate to within almost 0.1% over the range [0.003,1]. Using a partial fraction 
expansion of such rational functions allows us to use a multishift linear equation solver, thus 
reducing the cost significantly. The partial fraction expansion of the rational function above 
is 

1 ~ n •wrMfimam-L 0-0511093775 , 0.1408286237 , 0.5964845033 



^— ■ - , x+0 . 0012779193 1 x+0.0286165446 1 z+0.4105999719 " 

This is numerically stable because all the terms are positive and the roots of the denominators 
are all real. The fact that this desirable property holds for optimal rational approximations 
to x a for — 1 < a < 1 is one of the principle reasons for the efficacy of the RHMC algorithm, 
but why it does is not at all clear. 

It is interesting to compare the optimal rational and polynomial approximations for 1 / ^Jx, as 
this is one of the few cases where we can analyze their convergence analytically. The optimal 
rational approximation of degree d±l,d over the interval e < \x\ < 1 with a weight function 
w(x) = \fx, i.e., where we minimise the maximum relative error, was found analytically 
by Zolotarev (§4.7.2) and has an error that falls as A ~ exp(rf/lne). The optimal L 2 
approximation with respect to the weight function 1 / \J\ — x 2 is given by the Fourier series 

f (~1) J 4 rp ( , 

^(2jTIF T2j+l(a;) ' 
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which has L 2 error that falls as 0(1/ d). The optimal approximation cannot be too much 
better, or it would lead to a better L 2 approximation, so it certainly only converges as 1/d. 
Note that we are not being entirely fair in this comparison, as the weights, as well as the 
approximation intervals, are different in the two cases. 

It is also worth noting in comparing rational and polynomial approximations that using 
optimal polynomial approximation to 1/x to compute the inverse of a matrix (§3.6) is es- 
sentially the same as using a Jacobi iteration scheme, and this is well-known to be inferior 
to Gauss-Seidel iteration or Krylov space methods. 

3.5 Non-Linearity of CG Solver 

We have now completed our survey of approximation theory, [8] and we turn to considering 
its application in the RHMC algorithm. To motivate the need for this let us consider the 
following suggestion of a method to accelerate matrix inversion. 

Suppose we want to solve a system of linear equations A 2 x = b where A is a positive 
Hermitian matrix using the conjugate gradient (CG) method. It is well known that it is 
better to solve Ay = b and Ax = y successively, because the condition number 14 is reduced 
from k(A 2 ) = k(A) 2 to 2k(A). 

Suppose now that we want to solve Ax = b. Why don't we solve \[~Ay = b and \f~Ax = y 
successively? The square root of A is uniquely defined as A > 0, and we can even choose 
it to be positive too. Indeed, for application to quantum field theory most of our time is 
spent finding (M^M)~ 1 x, and the fermion kernel is manifestly positive, M^M > 0. All this 
generalizes trivially to n th roots, so it seems that we have the opportunity of reducing the 
cost even further. 

The question that needs to be addressed is how do we apply the square root of a matrix? 

3.6 Rational Matrix Approximation 

Before we can investigate how to evaluate functions of matrices we need to define what we 
mean by them. For our purposes we only need to define a function of a Hermitian matrix 
H, and this can be diagonalized by unitary transformation H = UDU^ 1 . We define the 
function f(H) just by changing to the basis where H is diagonal, applying / to the diagonal 
elements (eigenvalues), and then transforming back to the original basis, 

f(H) = f(UDU- 1 ) = Uf(D)U~\ 

14 The condition number of a matrix is defined to be the ratio of its largest and smallest eigenvalues. To a 
first approximation the cost of solving a system of linear equations is proportional to the condition number. 
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The great advantage of rational functions of matrices is that they do not require this diago- 
nalisation to be carried out explicitly, since by linearity 



aH m + pH n = U(aD m + pD n )U- x and H' 1 = UD- 1 U~ 1 . 



It suffices to compute the rational function using the operations of matrix addition, matrix 
multiplication, and matrix inversion in place of their scalar analogues. Furthermore, if we 
only want the result of applying the matrix function to a vector, f(H)v, then we can use 
the appropriate matrix-vector operations rather than the more costly matrix-matrix ones; in 
particular we can use a linear equation solver rather than finding a complete matrix inverse. 

3.7 'No Free Lunch' Theorem 

We now have an efficient way of computing matrix functions by finding a good rational 
approximation to the function and applying it using matrix operations. Let us return to our 
problem of solving Ax = b by solving n systems of the form A x l n x = y successively: for each 
of these we use a rational approximation to compute 15 x = A~ x l n y. To do this, we must 
apply the rational approximation of the n th root r(A) ~ A x l n . This is done efficiently by 
expanding the rational function in partial fractions and then applying all the terms at once 
using a multishift CG solver. The condition number for each term in the partial fraction 
expansion is approximately that of the original matrix k(A), so the cost of applying A x l n 
is proportional to k(A). We have therefore found out why our suggested inversion method 
fails: even though the condition number k (A l/n ) = K(A) 1/n and «(r(A)) = n(A) l / n the cost 
of applying r(A) is k(A), so unfortunately we don't win anything. 

3.8 Multiple Pseudofermions 

Let us return to the topic we were considering in the previous lecture (§2.15), where we 
introduced pseudofermions as a means of representing the fermion determinant. Recall that 
we rewrote the determinant as a bosonic functional integral over a pseudofermion field <f> 
with kernel M.^ 1 



What we are doing is to evaluate functional integrals that include the fermion determinant 
det Ai by using a stochastic estimate of the determinant, namely we are approximating the 
integral over the pseudofermion fields by a single configuration — in other words evaluating 
the integrand at only one point — yet this produces a Markov process with exactly the 
correct fixed point. We again seem to be getting something for nothing, so perhaps we 
should be suspicious that we are paying a hidden price somewhere. 

15 We save a lot of work here by noticing that it is just as easy to approximate A~ x / n as it is to approximate 
A 1 /™, so there is no need for nested CG solvers. 



det M oc 
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The hidden price is that we are introducing extra noise into the system by using a single 
pseudofermion field to sample this functional integral. This noise manifests itself as fluctu- 
ations in the force exerted by the pseudofermions on the gauge fields, and in turn 



• this increases the maximum fermion force; 

• which triggers the integrator instability; 

• which requires decreasing the integration step size. 

A better estimate is to write the fermion determinant as 

detM = [detAl 1/n ] n , 
and introduce a separate pseudofermion field for each factor 



n „ 

detM = [det M 1/n ] oc J [ / d<f>)d<f>j exp -<p)M~ 1,n <p^ 

7 = 1 "> 



3.8.1 The Hasenbusch Method 

Before we go further into the RHMC implementation of multiple pseudofermions we ought to 
note that the idea of using multiple pseudofermions was originally due to Hasenbusch. [9] He 
considered a theory with the Wilson fermion action Ai(n) where k is the hopping parameter 
which controls the fermion mass. He introduced a heavy fermion kernel Ai' = A4(k'), and 
made use of the trivial identity following from the associativity of matrix multiplication 
M = M'(M'~ l M) to write the fermion determinant as det M = det Ai' det(Af' ~ l M). He 
then introduced separate pseudofermions for each determinant, and tuned k' to minimise 
the cost of the overall computation. This can be all be easily generalized to more than two 
pseudofermions and to the clover- improved Wilson action. 



3.9 Violation of NFL theorem 

Let us now return to using our n th root trick to implement multiple pseudofermions. As 
before we observe that the condition number of the n th root K,(r(Ai)) = /t(Ai) 1//ra , and we 
may use the simple model that the largest contribution to the force acting on the gauge fields 
due to one of our pseudofermion fields is inversely proportional to the smallest eigenvalue 
of the fermion kernel, at least when the fermion mass is sufficiently small. This follows by 
considering equation (4), and recalling that we used Ai}Ai where we now have Ai 1 ^. In fact 
we might even expect the force to grow faster than this, but let us stick with our simple model. 
As the largest eigenvalue of At is more-or-less fixed this force is proportional to K,{Ai) l ^ n ; we 
have n pseudofermions each contributing to the force, and if we are conservative and assume 
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that these contributions add coherently we get a total force proportional to nn(M.) l l n . This 
is to be compared to the original single pseudofermionic force of k(M), so we expect that 
the maximum force is reduced by a factor of nn(M)" 1 . This is a good approximation if 
the condition number is dominated by a few isolated tiny eigenvalues, which is just the what 
happens in the cases of interest. If the force is reduced by this factor the according to our 
previous considerations (§2.16) we can increase the step size by the reciprocal of this factor, 
and all other things being equal, the overall cost is reduced by a factor of tik(M)" L . 

If we take this model seriously then we can easily calculate the optimal number of pseudo- 
fermions by minimising the cost, and we find that the optimal value is n opt ~ ln(«(M)), and 
the corresponding optimal cost reduction is eln k/k. 

So, by introducing a small number of pseudofermion fields (of order ln/c(A / f)) we expect 
to get a cost reduction of order 1/k(A4), and this works in practice — at least there is 
a significant cost reduction, the exact scaling laws of our simple model are undoubtledly 
only followed very approximately at best — thereby violating the infamous 'no free lunch' 
theorem. 

The advantage that this method has over the Hasenbusch method is that it is trivially 
applicable to any fermion kernel and that no parameter tuning is required: the condition 
number is automatically equipartitioned between the pseudofermion fields. On the other 
hand, the Hasenbusch method has the advantage that one of the pseudofermions may be 
cheap to apply because it is heavy. 



3.10 Rational Hybrid Monte Carlo 

Let us now go through the details of the Rational Hybrid Monte Carlo (RHMC) algorithm for 
the fermion kernel 16 (Ai^A4) 1 ^ 2n . In §2.9 we explained how the HMC algorithm alternates 
two Markov steps — momentum refreshment and MDMC — both of which have the desired 
fixed point and together are ergodic. When we include pseudofermion fields we need to add 
a third Markov step to ensure ergodicity, which is sampling the pseudofermion fields from a 
heatbath. This is easy to do because the (pseudo)fermions only appear quadratically in the 
action: 17 we sample the pseudofermions from a Gaussian heatbath by applying the square 
root of the kernel to Gaussian-distributed random noise, 

X 3 = QM T .M ) 1/4 %- with P(fc) xt 

16 This corresponds to l/n th of a fermion 'flavour'; we take advantage of the fact that it is no more work 
to take the 2n th root of A4^A4 than to take the n th root of A4, so we can ensure that the fermion kernel is 
manifestly positive for no extra cost. 

17 We could not get away with this simplification if we wanted to improve the fermion action by including 
four- fermion operators or higher. 
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as then 
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oc exp 

We then refresh the momenta 18 , and integrate Hamilton's equations for the gauge fields using 
the force (c.f., equation (4)) 



7T : 



<9,S B x d{M^M)- l l 2n 



d<j> ^ ^ <90 
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Finally we complete the MDMC step by applying a Metropolis test. 

Of course we apply all the fractional powers of the matrices that occur using optimal rational 
approximations expressed in partial fraction form. Note that 

• we use an accurate rational approximation r(x) ~ A ^fx for the pseudofermion heatbath; 

• we use a less accurate approximation for the MD evolution, f(x) « 2 ^/x; 

• we use an accurate approximation for Metropolis acceptance step. 



This is because any errors in the pseudofermion heatbath or Metropolis test would effect 
the fixed point distribution, whereas errors in the MD only effect the acceptance rate. If 
the rational approximations used in the pseudofermion heatbath and the Metropolis test are 
accurate to machine precision then the algorithm is still as exact as HMC: that is, the only 
systematic errors are the rounding errors from using finite precision floating point arithmetic, 
and from the fact that the Markov process may not have converged well enough to its fixed 
point. It is not hard to generate optimal rational approximations that are good to machine 
precision: this should not be surprising as this is how most scalar functions (exponential, 
logarithm, fractional powers, etc.) are evaluated anyhow. 

Let us summarize the key properties of the RHMC algorithm. 



We apply all rational approximations using their partial fraction expansions: 

— the denominators occuring in this are all just shifts of the original fermion kernel; 

— all poles of optimal rational approximations are real and positive for cases of 
interest (Miracle #1); 



18 There is no a priori reason why we have to refresh the pseudofermions and the momenta with equal 
frequency, but there is no obvious benefit from not doing so. 
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— only simple poles appear (by construction, a double pole can only occur if we try 
to approximate something that was itself a square, which would not be a sensible 
thing to do). 

• We a use multishift solver to invert all the partial fractions using a single Krylov space: 
the cost is dominated by the construction of the Krylov space; updating the n solution 
vectors is comparatively cheap, at least for O(20) shifts. 

• The result is numerically stable, even in 32-bit precision. This is because all the partial 
fractions have positive coefficients (Miracle #2). 

• The MD force term is of the same form as for conventional HMC except for a constant 
shift for each partial fraction; therefore the method is immediately applicable to any 
fermion kernel for which we can implement conventional HMC. 

3.11 Comparison with R Algorithm 

Let us now look at some empirical (numerical) studies [10] that compare the performance of 
the RHMC and R algorithms for non-zero temperature QCD with staggered quarks (§3.11.1) 
and for chirally symmetric domain wall fermions (§3.11.2), where the R algorithm has been 
the method of choice in the past. 

3.11.1 Finite Temperature QCD 

First we compare the of the algorithms' performance near the chiral transition point. The 
aim of this study was to see how the RHMC algorithm behaves in this regime, and to compare 
its cost with that of the R algorithm. 



Algorithm 


5r 


P 

1 acc 


B, 


R 


0.0019 




1.56(5) 


R 


0.0038 




1.73(4) 


RHMC 


0.055 


84% 


1.57(2) 



Table 1: Values of the Binder cumulant for (ipip), _B 4 , and the RHMC acceptance rate P acc . 

For these tests we used 2 + 1 flavours of naive staggered fermions with m ud = 0.0076 and 
m B = 0.25, and the Wilson gauge action on an 8 3 x 4 lattice. The MD trajectory length 
was t = 1. The results are given in Table 3.11.1, where the Binder cumulant of the chiral 
condensate, B4, is shown as a function of step size for the two algorithms. The RHMC 
acceptance rate, P acc , is also given. The correct value of the Binder cumulant is obtained 
with RHMC using a step size 29 times larger than is necessary for the R algorithm. 
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Figure 5: The mean plaquette is shown as function of (3 for the RHMC and R algorithms at 
the parameters given in the text (near the chiral transition point). The RHMC points have 
been displaced right so that they can be seen more clearly. For the R algorithm only the 
data at the smaller step size is within one standard deviation of the RHMC results. 

Figures 5 and 6 are plots of the plaquette and the chiral condensate (^ip) as a function of 
5t: only for the smaller step size is the R algorithm result consistent with that from RHMC. 

It is clear that RHMC is superior to the R algorithm in this case, principally because it is vital 
to keep the systematic step size errors under control when studying the properties of phase 
transitions. Even though the errors in the equilibrium distribution of the R algorithm Markov 
chain are of 0(Sr 2 ), even the introduction of a small admixture of unwanted operators in 
the action can significantly change the behaviour of a system near criticality; indeed, they 
can even change the order of the phase transition if we are not extremely careful when we 
take the zero step size limit. 

3.11.2 2+1 Domain Wall Fermions 

Lattice calculations with on-shell chiral dynamical fermions are still in their infancy, but 
it seems clear that they will become increasingly important as sufficient computing power 
becomes available. There are several methods to carry out such calculations, as we shall 
discuss in the next lecture (§4), but at present the only such computations carried out on 
large lattices with fairly light quarks have used the domain wall formulation. The results 
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Figure 6: The chiral condensate (tjjip) is shown as a function of (3 for the RHMC and 
R algorithms at the parameters given in the text (near the chiral transition point). For the 
R algorithm only the data at the smaller step size is within one standard deviation of the 
RHMC results. 



shown here were obtained as part of the UKQCD-Columbia-RBRC collaboration, and the 
parameters used are summarized in Table 3.11.2. 



Algorithm 




m s 


5t 


P 
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(P) 


R 


0.04 


0.04 


0.01 




0.60812(2) 


R 


0.02 


0.04 


0.01 




0.60829(1) 
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0.02 


0.04 


0.005 




0.60817(3) 


RHMC 


0.04 


0.04 


0.02 


65.5% 


0.60779(1) 


RHMC 


0.02 


0.04 


0.0185 


69.3% 


0.60809(1) 



Table 2: The different masses at which the domain wall results were gathered, together with 
the step sizes 5t, acceptance rates P acc , and mean plaquette values (P). The runs were all 
performed on 16 3 x 32 lattices with n 5 = 8 with the DBW2 gauge action at (3 = 0.72 

The step size dependence of the plaquette is shown in Figure 7. From this we can see that 
in order to obtain a result consistent within error bars the R algorithm needs an integration 
step size more than 10 times smaller than that used for RHMC, although the zero step size 
extrapolation using a quadratic fit to the R algorithm data gives a reasonable value. 

We expect that the autocorrelations for RHMC and R algorithms should be similar if carried 
out with the same trajectory lengths, except of course that the RHMC autocorrelations 
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0.6085 




Figure 7: The step size dependence of the plaquette for domain wall fermions with the 
parameter values given in Table 3.11.2. Note that the RHMC data point shown at 5t = 
(because it has no step size errors) is really at 5t = 0.04, which is far to the right of the graph. 
The quadratic extrapolation seems to work reasonably for the R algorithm data points, but 
the need to carry out this extrapolation significantly increases the estimated error. 



should be longer by a factor of the inverse acceptance rate. The integrated autocorrelation 
time for the rr propagator from the domain wall test with m u d = 0.04 is shown in Figure 8, 
and the data seems to bear this out. 

The conclusion from this study too is that the RHMC algorithm leads to a significant cost 
reduction in this case too. 



3.12 Multiple Pseudofermions with Multiple Timescales 

We now turn to a way that the RHMC algorithm allows us to implement UV filtering, that is 
to separate the short-distance ultraviolet contributions to the pseudofermion force from the 
long-distance infrared ones. The semi-empirical observation is that the largest contributions 
to the gauge field force from the pseudofermions do not come from the partial fractions 
with small shifts, but from those with large ones. To see why this may be so, look at the 
numerators in the partial fraction expansion that we exhibited earlier 



sfx 



0.0511093775 , 0.1408286237 , 0.5964845033 
+ s+0.0012779193 ' z+0.0286165446 ' x+0. 4105999719 ' 



Even for this low-order approximation the coefficient of the largest shift is more than an 
order of magnitude larger than that of the smallest shift. A large shift corresponds (in some 
sense) to a larger fermion mass, which is why the effects of the large shift partial fractions 
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Figure 8: The integrated autocorrelation of time slice 13 of the 7r propagator for domain 
wall fermions. 

may be thought of as short distance or ultraviolet modes. Figure 9 shows the situation for 
a more realistic situation. 

There are a couple of ways we can make use of this observation: we can use a coarser 
timescale for the more expensive smaller shifts, as explained in §2.14, or we can be less 
sophisticated (but sometimes more cost effective) and significantly increase the CG residual 
used in computing the force from the small shift partial fractions. We stress that this does not 
lead to any systematic errors provided we use a time-symmetric initial guess in constructing 
the Krylov space. 

It is worth noting that we are putting the force contribution from different partial fractions 
for each pseudofermion field on different timescales: the different pseudofermions themselves 
are treated entirely equally. 

3.13 L2 versus Force Norms 

Figure. 10 is Wilson fermion forces taken from the paper by Urbach et al. [11] Here the 
authors have used Hasenbusch's method (q.v., §3.8.1), and as can be seen there is a clear 
difference between the L2 norms of the force contributions and the norms. From our 
discussion of instabilities in §2.13.1 we may expect that the is the more appropriate 
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Figure 9: In this graph we plot the L 2 force as a function of the partial fraction a/ (x+/3). The 
value of the logarithm of the shift, In/?, is plotted on the horizontal axis, and it is clear that 
the values of In f3 are fairly unformly distributed. The coefficients of each pole, the residues 
a, are also plotted, and it can be seen that they increase rapidly with increasing values of the 
shift (3. We also show the measured L 2 norm of the force contribution from each pole, and 
these also increase with /3 for small (3, but reach a maximum at In/? = —1.7. The norm of 
the force contributions behaves similarly. We also show the values of a/(/3+0.125), which can 
be taken as a simple model of the behaviour of the force contributions. Finally we plot the 
number of CG iterations required to reach a fixed residual for each pole: this is proportional 
to the cost of computing the force, and decreases rapidly with (5. It is clear that the largest 
force contributions come from the poles that require only a cheap CG computation, and 
the expensive poles only contribute a small amount to the total force acting on the gauge 
field. We can taken advantage of this either by putting the small shift poles onto a coarser 
timescale, or by evaluating them much less accurately. 
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Figure 10: L 2 versus force norms. 

one to detect when a particular force contribution causes the integrator to become unstable. 
However, while the values of the two norms are very different the relative magnitudes of the 
force contributions are still fairly similar, so the difference may be moot. 

3.14 Berlin Wall for Wilson fermions 

Figure 11 shows the 'Berlin wall' for Wilson fermions, and is also taken from the paper by 
Urbach et ai, [11]. The estimated cost of generating 1,000 independent configurations on a 
24 3 x 40 lattice is plotted in units of Tflop years (3 x 10 19 floating point operations) as a func- 
tion of m 7T /m p . The solid line is the result for a conventional two-flavour HMC simulation 
with Wilson fermions performed by CP-PACS and JLQCD; the squares are results of work 
of Urbach et ai, where they introduced two or three pseudofermion fields using Hasenbusch's 
method and used different integrator step sizes for the various force contributions such that 
the force times step size St is roughly constant. The dashed and dotted lines are the ex- 
trapolations of these latter results using ansatze of (m^/m^)" 4 and (mjm^ 6 respectively. 
The triangles are the corresponding results for Staggered fermions, and the arrow indicates 
the physical pion to rho meson mass ratio. 

They show that the performance of Hasenbusch trick and multiple timescale is compara- 
ble to Liischer's SAP algorithm, which from our present point of view probably speeds up 
dynamical Wilson fermion computations because it too splits the pseudofermion force into 
three components. 
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Figure 11: The Berlin wall for Wilson fermions moves to much lighter quarks when multiple 
pseudofermions are introduced. 

3.15 Conclusions (RHMC) 

In summary, the advantages of RHMC are that it 

• is exact, there are no step size errors: hence no step size extrapolations are required; 

• is significantly cheaper than the R algorithm; 

• allows easy implementation of Hasenbusch-like tricks using multiple pseudofermion 
fields; 



is amenable to futher improvements, such as using multiple timescales for different 
terms in the partial fraction expansion. 
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So far RHMC has no obvious disadvantages. 



4 5D Algorithms for Chiral Lattice Fermions 

This final lecture section is devoted to five- dimensional formulations of chiral fermions. We 
begin by introducing on-shell lattice chiral fermions, following a logical rather than historical 
approach. 

4.1 On-shell chiral symmetry 

We shall write the various forms of the Dirac operator as ij) = • 7 M where the Dirac 
7 matrices in Euclidean space are necessarily Hermitian. We shall also assume all Dirac 
operators are 75 Hermitian, ffi = 75-^75. Note that while this is an assumption it does not 
seem to be very strong one, as almost all lattice Dirac operators satisfy it (except for the 
twisted-mass formulation). 

The key idea is that it is possible to have chiral symmetry on the lattice without doublers 
if we only insist that the symmetry holds on-shell. Liischer observed that on-shell chiral 
symmetry transformations should be of the form [12] 

^ _ e *a 76 [l-2 C a^ &nd ^_ > ^ e i«[l-2(l-C)o|>] 78j 

where we have introduced a free parameter (, for which you may choose your favourite 
value. As always, the choice of chiral transformation properties of a field is independent of 
the choice of Lorentz transformation properties, and we can assign tp and ijj independent 
choices of transformation as they are independent fields, despite the notation: the dagger 
just indicates that ^ — if)' has the same Lorentz transformation as the Hermitian conjugate 
of ip. In order that this be a symmetry the Dirac operator occuring in it must be invariant 
under the transformation 

Jj) _> gia[l-2(l-C)a#]75 ]fi e iaj 5 [l-2<;ap] = jfi _ 

As this must hold identically for all values of a the coefficient of a in its Taylor expansion 
must vanish, namely 

[1 - 2(1 - C)ai%5$ + #75[1 " 2Ca$] = 0, 
which is nothing but the Ginsparg-Wilson relation 

7 5 $ +$75 = 2a$ 75 $. (7) 
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4.2 Neuberger's Operator 



We can find the unique 19 solution of the Ginsparg-Wilson relation equation (7) in the fol- 
lowing way. We choose to parametrize the lattice Dirac operator by writing 

al/) = |(l + 757 5 ); 

this is no more that the definition of 75 = 7 5 (2gl0 — 1). The 75 hermiticity of the Dirac 
operator immediately implies that 75 is a Hermitian operator 75 = 75. If we also require the 
Dirac operator to satisfy the Ginsparg-Wilson relation, we immediately find that 75 = 1, so 
75, like 75, is both Hermitian and unitary. Ift must also have the correct continuum limit 
Ip -> Z N $ +0{a), so 

75 = ls(2aZ N - 1) + 0{a 2 ) = 75 - l) + 0(a 2 ), 

where we have defined M = Zw/2aZ^ and Ipw ~~ *• %w$ + 0(a). Here Ipw is some lattice 
Dirac operator, such as the Wilson operator, and Z N and Z w are wavefunction renormal- 
izations. The large negative mass M is called the domain wall height in the language of 
domain wall fermions, and is an unphysical cut-off-scale quantity. Its value is to be chosen 
to distinguish cleanly between the physical fermions and the doublers, the two getting sent 
to opposite ends of the Neuberger operator spectrum. Both of these conditions are satisfied 
if we define 

a/ (Ww — M)HjPw — M) 
This is the form given by Neuberger. [13, 14] 

It is often convenient to break chiral symmetry explicitly by adding a physical fermion mass. 
The massive Neuberger operator is 

$ N {H, H) = \[l + n+(l- /i) 75 sgn(tf)] , (8) 

where \x is the physical mass and H is the Hermitian Dirac operator kernel including the 
negative mass, H = H(—M) = 75 (0 — M). As shown in Figure 12, all the eigenvalue of the 
Neuberger operator lie on the unit circle. 

The immediate question is whether Ip ^ is a local operator. Although it is manifestly not 
ultralocal, it can be shown to be a local operator if the kernel Ipw has a gap in its spectrum. 
[15] The Wilson kernel Ip w certainly has a gap if the gauge fields are smooth enough to satisfy 
the admissibility condition that (P) < 1/30: this is a sufficient but not a necessary condition. 
Moreover, it seems reasonable that good approximations to the Neuberger operator will be 
local if Ip n is and vice versa. 
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Up to the choice of kernel, and assuming 75 hermiticity. 
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Im 




Figure 12: The operator 7575 is unitary and thus its spectrum lies on the unit circle. This 
means that the spectrum of the massive Neuberger operator defined by equation (8) lies on 
a circle of radius |(1 — fi) whose centre is at |(1 + /x), i.e., a circle whose minimum real part 
is fi = 0(a) and whose maximum real part is one. The physical states all lie near \x and the 
doublers are near one. 

4.3 5D Chiral Fermions 

In order to measure propagators, or to compute the pseudofermionic force in HMC, we have 
to find the inverse of the Neuberger operator ij) n. If we approximate the sgn function by a 
rational function of some sort then just applying it using our usual partial fraction technique 
requires the construction of a Krylov space. When we were considering RHMC before we 
noted (§3.7) that it is just as cheap to compute A~ x l n as it is to compute A 1//n , since the 
inverse of a rational function is another rational function. Sadly, this does not help us in 
the present situation, as the Neuberger operator is constructed from two non-commuting 
operators, J}) and 75. 

It transpires that one can trade-off this non-commuting property with a kind of dimensional 
reduction from five dimensions to four (this is of course a totally different fifth dimension 
from the fictitious time we introduced in §2.10). We are thus led to study five-dimensional 
formulations of chiral fermions where in exchange for having an extra dimension we get away 
with inverting the Neuberger operator within a single Krylov space. 

We shall introduce a class of five-dimensional algorithms that are described by four indepen- 
dent choices: 

• Kernel — what Dirac kernel is used as the argument of the sgn function in Neuberger's 
operator (§4.4). 

• Constraint — whether we use the five-dimensional system solely as a means of comput- 
ing propagators, or use it to construct a five- dimensional functional integral formulation 
(§4-6). 
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Approximation — what rational approximation of the sgn function is used (§4.7). 

Representation - - how we map the rational approximation into a five-dimensional 
system of linear equations (§4.8). 



4.4 Choice of Kernel 

The simplest kernel is the Wilson kernel with a large negative mass, Hw = ^^{Jj) w — M). 
Another popular kernel which arises naturally in the domain wall formalism is the Shamir 
kernel H T = 75-0 r where 

a 5 (l/) w -M) 

aU>T = 7tz r- 

2 + a^{$ w -M) 

A third kernel which interpolates between the preceding kernels is the Mobius kernel Hm = 
75-0 m, where 

(b + + b_)($ w -M) 



alt) 



M 



2 + (b + -b_)(lp w - M) 



Special cases corresponding to some choice of the two parameters b + and 6_ are shown in 
Table 4.4. 



Kernel 



Borigi/ Wilson 1 1 
Shamir/DWF g 5 

Table 3: Special cases of the Mobius kernel. 



4.5 Schur Complement 

The key to understanding the connection between four and five-dimensional formulations is 
the Schur complement of a matrix. Let us consider the five- dimensional block matrix 



AB 
CD 



where the blocks A, B, C and D are themselves matrices. M 5 may be block diagonalized by 
an LDU factorisation (also known as Gaussian elimination) 

AB 

Cd)~[ CA- 1 1) [OD- CA- l B / 

\ (\A- l B 

l B Mo 1 







r 


B 


CA- 




\0 D 


-CA 






( A 





CA- 




I D 


- CA 
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This decomposition can be done by inspection, and is unique. The bottom right block is the 
Schur complement. In particular, the determinant of M 5 is readily calculated by 



det (^;d) = det ( AD ~ ACA^B). 



4.6 Constraint 



How do we make use of a representation of the Neuberger operator as the Schur complement 
of some five-dimensional matrix? The first way is to use a five dimensional linear system to 
compute the required four-dimensional propagator. Consider the five-dimensional system of 
linear equations 

/ 0i \ M 



M = D, 



<Pn-2 
4>n-l 
V V> / 



o 




W 



= x. 



where D 5 is an n x n five-dimensional matrix, and $ and X are five-dimensional vectors 
constructed from n four-dimensional vectors as indicated. Using the block LDU factorisation 
(§4.5) our linear system reduces to 

L _1 Dr$ = L^LDU® = D<& = L~ X X = X, 



$ = U<S> 



where 

( *i\ 

02 

0n-2 
0n~l 

V v» J 

is the five-dimensional column matrix analogous to $ but with the unphysical constraint 
fields replaced with the equally unphysical 0, and with the same bottom component ip. 
Note that the inverse of a triangular matrix is of the same shape as the original matrix, 
so L^ 1 is a lower triangular matrix As D is block diagonal we have thus reduced the five- 
dimensional system to a family of uncoupled four-dimensional ones, the bottommost of which 
involves the Schur complement, which is just the Neuberger operator, = ij) = X- 

Therefore by solving the five-dimensional system = X the bottom component -0 of $ is 
the solution of the four-dimensional system we are interested in. Moreover, we shall see that 
we can construct D 5 such that it is linear in the kernel Ij) w and has as its Schur complement 
any rational approximation to the Neuberger operator we desire. 
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Alternatively, we may introduce a five-dimensional pseudofermion field $ = (0i 02 • • • <f> n -i "0) 
and write the five-dimensional pseudofermion functional integral as 

/n 
d&dQ e" $t ^ 1$ oc det D 5 = det LDU = det D = JJ det D^-. 

i=i 

The fermion determinant thus obtained is the product of the determinants of all the diagonal 
blocks Djj, wherease we only want the determinant of the Schur complement, D n n fa Ip N . 
We therefore also introduce n — 1 Pauli-Villars fields 20 to cancel them, 



n— 1 „ n— 1 



ft J- /l It -L 
„■ 1 J A 1 



3=1 " 3=1 



4.7 Approximation 
4.7.1 Hyperbolic Tangent 

The simplest approximation to the sgn function is tanh(ntanh _1 (a;)) . In some ways this is 
quite similar to the situation for Chebyshev polynomials (§3.3): there we had an apparently 
transcendental function that was in fact a polynomial, here we have an apparently transcen- 
dental function that is in fact a rational function. We will write the relevant formulae for 
even values of n; there is no problem with using odd values of n, but the formulae are just a 
little messier if we try to be too general. We first write note that 

sinh z e z — e~ z 1 — e~ 2z 

x = tanhz = — : — = = — , 

cosh z e z + e~ z 1 + e 2z 

solving this equation gives 
so we may write 



. 1 — x 

1 - e~ 2nz 1 ~ \ 1 + x 



n 



£ n -i,n( x ) = tanh(ntanh 1 x) = - - ^ 2nz = _ x n . (9) 

It should be noted that the leading term in the numerator is x 71 ^ 1 and not x n , as that term's 
coefficient vanishes for n even; this should not be surprising, as e is an odd function of x. 
The denominator vanishes when 



1 + X J 1+x 



20 Perhaps they would be better called pseudo-pseudofcrmion fields, as they are boson fields with a local 
kernel introduced to cancel the unwanted extra pseudofermion determinants. 
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or 



, 2 +ftan^±i^] =0. 

n 



Using a similar argument to find the roots of the numerator we obtain the factored form 



71 

m= 


l 
i 


x 2 + (tan^f) 2 




——l 
rifc=o 


x 2 + (tan ( fe+ ^) 2 j 



If we define £ = e 2z then 



1 - € 1 2 

£n-l,n{x) + 1 = ; . - + 1 = 



n-1 



• - ■ , £ - 0.' ' 

fc=0 



where £ fe = C^ fc satisfies = — 1 with £ = e 7 ™/™ and = £ 2 being a primitive n th root of 
unity, and 

2 2 2£ fc 



n-1 



n-1 



I V 

i=0 £=1 

The last equality follows from the fact that YYe=i ( x ~ ^t) ls a cyclotomic polynomial, 

n n—1 n— 1 n— 1 

^ = E-' = II(-—')^ IK 1 —')="• 



fc=0 



t=l 



We thus have 



2 n_1 £ 

s n _i, n (x) + i = — 



f-1 



fc=0 
-1 



fe=0 

0. 1/& 



6 



^n-l-k 



9 2 



n ^ 

fc=0 



1 + 



x sec 



(fc+i) 7 



2 (6 + VjfcK - 2 



which is the partial fraction expansion of e n - 1;n (x). 



= i + -E 



1+ (tan ;| 



k=0 x 2 + 



(tan^) 



2 ' 



4.7.2 Zolotarev's Formula 



The hyperbolic tangent is simple, but it is by no means the optimal approximation of a given 
order. The analytic form of the optimal approximation that satisfies Chebyshev's criterion 
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(§3.4) was found by Zolotarev, who was a student of Chebyshev. Again, Zolotarev's formula 
is most simply written in terms of transcendental functions, this time elliptic functions 



1 - 



sn(z; k) 2 



sn(z; k) 



[n/2\ 



sn (2iK'm/n; k) 2 



sn(z/M; A) 



M 



n 



sn(z; k) 2 



sn 



(2iK'{m-\)/n\ kf 



The way to understand this formula is that it expresses the elliptic function sn(z/M; A) as a 
function of sn(z; k), in just the same way that we expressed tanhnx as a function of tanhx, 
or cosnrr as a function of cosrr previously. Elliptic functions are doubly periodic analytic 
functions, and the Jacobi elliptic function sn(z; k) is defined by 21 



so as to have a real period of AK(k) and imaginary period of 2iK(k'), where k 2 + k' 2 = 1 
and K is the complete elliptic integral 



M and A are chosen such that the imaginary period of sn(z, k) is n times that of sn(z/M; A), 
2iMK(\') = 2iK(k')/n, while the real periods are the same, AMK(X) = AK(k). 

Figure 13 shows sn(z/M; X) as a function of sn(z;k). The role of the double periodicity 
is now apparent: the real period corresponds to the step in the sgn function at the origin, 
and the imaginary period corresponds to the equal height oscillations about ±1 required by 
Chebyshev's criterion. 

Figure 13 shows a surface plot of the real part of sn(z/M; A) for n = 6; the shading of the 
surface indicates the imaginary part which is zero along the indicated contour. The entire 
region of the complex plane shown is the fundamental region for sn(z; k), and corresponds 
to six fundamental periods stacked in the imaginary direction for sn(z/M; A). 

4.7.3 Errors 

We next compare these approximations to the sign function over the interval 10~ 2 < \x\ < 1. 
For both Zolotarev and tanh approximations we take rational functions of degree (7, 8) 
and compare them in Figure 15 on a log-linear plot. We can see that tanh is a better 
approximation near \x\ = 1, but becomes a much worse approximation when x is small. 
Indeed, the errors in the tanh approximation are symmetric about \x\ = 1: this was what 

21 This is the analogue of defining sin as the inverse of sin _1 (a;) = J* dt / \J 1 — t 2 . The inverse of sn is 
called the elliptic integral of the first kind, E(z; k). 
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Figure 13: The degree Zolotarev approximation to sgnx over the interval e < \x\ < 1. 
The real period of the elliptic function corresponds to the step in the sgn function, and the 
imaginary period corresponds to the equal height oscillations about ±1. 

motivated the introduction of the Mobius kernel, as a suitable choice of the parameters shift 
the spectrum of the kernel to better overlap with the region where the error for the tanh 
approximation is small. 

4.8 Representation 

The formalism developed in §4.6 requires that we have a (five-dimensional) matrix which has 
the desired approximation (§4.7) as its Schur complement. Constructing a variety of such 
matrices is the goal of this section. 

4.8.1 Continued Fraction 

We begin with the continued fraction representation. [16, 17] Consider the block (five- 
dimensional) matrix with four-dimensional blocks Aq, . . . , A n on the principal diagonal and 
unit matrices on the adjacent block diagonals, for example with n = 4 this is 

M> i o o \ 

1 A 1 1 
1 A 2 1 ' 

V o o i aJ 
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Figure 14: Surface plot showing the real part of sn(z/M;A) over the fundamental region of 
sn(z;k). The imaginary part vanishes along the contour indicated. 



it is straightforward to write its block LDU decomposition, 

/ 1 0\ /S o 0\ /lS^ 1 \ 



S, 



-l 



1 



Si 1 1 
V o S, 1 lj 

where So = A and S n + l/S n -i 
continued fraction 



5i 
S 2 
\0 S 3 / 



1 S± 
1 S 2 X 
\0 1 j 



S 3 = A 3 - — = A 3 

&2 



A n . The Schur complement of the matrix is thus the 
1 1 



Ao - 



Si 



A,- 



Ai - 



A ( 



We may use this representation to linearize our rational approximations to the sgn function 
expressed ClS db continued fraction, 



£n-l,n{H) = PqH 



1 
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s(x) - sgn(x) 
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Figure 15: The errors in approximations to the sign function, e(x) — sgn(a:), as a function of 
x. The dashed curve is tanh(8 tanh -1 x) — sgnx and the dotted curve is the corresponding 
error for the optimal (Zolotarev) approximation of the same degree over the interval 10 -2 < 
\x\ < 1. The tanh approximation does not depend on the choice of interval. 



by setting A n -j = (—lyPjH for j = 0, . . . , n. We can improve the condition number of our 
five-dimensional matrix by making use of the freedom to introduce arbitrary parameters into 
the continued fraction expansion by multiplying the numerator and denominator at level j 
by a constant Cj +1 , 

Ci 

£n-l,n(H) = PqH H 



C\C2 

c&H + — 



_ TT . C n —\C n 
CnPnH 
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which is the Schur complement of the block tridiagonal five-dimensional matrix 

^ CnfinH yj C n —\C n \ 

\Jc n -\c n —c n _i[3 n ^iH ■ ■ 

o ••. ••. ••. 

••. c 2 p 2 H ^c 2 

^ P HJ 



4.8.2 Partial Fraction 



Next we consider the partial fraction representation. [18, 19] To this end we consider a 
matrix of the form 

\ 



V 



X 


1 










Pi 










1 


PlX 

~w 

















X 
P2 


1 


1 








1 


P2X 





-1 





-1 





R 



J 



which in general has n 2 x 2 blocks along the principal diagonal with the corresponding =pl 
in the bottom row and rightmost column, and compute its block LDU decomposition: 



/ 



L = 





















Pi 

X 


1 























1 


















P2 

X 


1 







_Pi 


qix 


Qi 


_P2 






1 


X 


x 2 — q\ 


x + q 1 


X 


x 2 - q\ 


x + q 2 
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and 



Pi 
o 



D 



pi(x 2 - q\) 
xq\ 





/ 













x 

P2 






P2{x 2 - qp 
xq\ 









R + 



P2% 



U 





Pi 








Pi 






X 






X 







1 








Qix 

9 2 ~* 


Qi 










x z — q{ 


x + qt 








1 


P2 


P2 










X 


X 













1 


q 2 x 












x 2 - q 2 


x + q 2 














1 



PlX 

x 2 — q\ ' x 2 — q\ ) 
\ 



so its Schur complement is 



R + 



PlX 

x 2 — q\ 



+ 



P2X 



4' 



This allows us to represent the partial fraction expansion of our rational function as the 
Schur complement of a five- dimensional linear system 

PjH 



e n -i tn (H) - ^ Jj^Z 



4.8.3 Euclidean Cayley Transform 



Finally we consider a five-dimensional matrix of the following form [20, 21, 22, 23] 





1 














i 















1 





V 










c- ) 



and compute its block LDU decomposition 



L = 



f 


1 








0\ 




_ T -1 

1 2 


1 













_ T -1 


1 





\ 








_ T -1 


V 
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D 



and 



(I 















\ 





1 


















1 



















c_ 


- T^T^T^T, 


-1 






(I 








-T{ l C+ 











1 





-T^T^C + 














1 


-r 3 - 1 r 2 - 1 Tr 1 c, 




1 




V° 








1 


J 





so its Schur complement, in the general n x n case, is 

D n , n = C- T-'T-}, ■ .-T 2 l T{ l C + . 

Note that L does not depend on C±. If we define T — T\ • • ■ T n and C± = §(1— /i)±±(l+/i)7 5 , 
then this Schur complement becomes 



(1 + H) + (1 - //) 75 



l-T 



1 + T_ 

The rational approximation to the Neuberger operator can thus be written as 
U) N (H)^\ [(1 +//) + (!- A*)75£(if )] 



= D n>n (l)- 1 £) nin ( A i) = i 



+ + (1-^)7, 



l-T 
1 + T 



if we define T(x) to be the Euclidean Cayley transform 22 of sgn(:r) 

1 - T(x) 

em = — . . ; 

v ; l + T(x)' 

it is trivial to find T(x) because this relation is its own inverse, that is 

1 — six) 



Tlx) 



l + e(x)' 



For an odd function such as e(x) we have 

e(-x) = -e(x) <^> T{-x) 



T(xY 



this means that if Uj is a zero of T(x) then —ojj must be a pole, so T(x) can be written as 
a product of factors of the form 



2 A Cayley transform is the mapping 



U 



1 -iH 



UJj — x 
UJj + x 



l + iW 

between a Hermitian matrix H and a unitary matrix U. 



H = i- 



.U-l 
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Moreover, if e(0) = then T(0) = 1, so T = T{T 2 ■ ■ -T n . 



The Euclidean Cayley transform of the hyperbolic tangent approximation (§4.7.1) is imme- 
diately apparent from equation (9), 



so all the roots are degenerate, cjj = 1 (Vj). For the Zolotarev approximation (§4.7.2) the 
roots of its Euclidean Cayley transform are the values of x for which the e n -i in (x) = ±1, 
which are readily found in terms of elliptic integrals. 

We want to solve the four-dimensional linear system 



equation = D(1)X, whose bottom block is _D njn (/z)0 = D n n (l)x- Since D n n is the 

Schur complement this gives D nri (l)~ 1 D rin (/i)0 = x, so as before (q.v., §4.6) we find the 
bottom four-dimensional block <fi of the five-dimensional solution vector <3> is the solution of 
our four-dimensional problem. 

4.8.4 Relation to Domain Wall Fermions 

The Euclidean Cayley transform representation is a generalization of the usual domain wall 
fermion formulation. We consider the generalised domain wall with the Mobius kernel, for 




$ n {h)k\ (1 + n) + (1 - 



1 — T 



<p = £>„,„(!) l D rhn {^)(t) = x, 



1 + T 
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which one solves the five-dimensional linear system P> 5 (/x)$ = D 5 (1)X where 



DM 



( D<? 
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-nD<pPJ\ 




p_ 




p>fp+ 













^(3) 















r^ 4) p_ 




r,i 4 >p + 


\-nD_ 










^C5l 

p>Y J 


p_ 


D { _> / 


fD W 

















D® 


D (2) 




























p_ 










Df 


£,(4) 









V o 








p>V 5) 


D w J 








/ D W 


£>« 








^ 









D (_2) 


Df 










+ 










£>?> 


















D w 


£>?> 






v-M 5) o 








D (5)y 





(10) 



Here P± — ±(1 ± 75) are the chiral projectors, and 



r4 S) = « s [&± } (^-M)±l], 



with 



6^ + b { : } 



6 W _ b W 



b + -6_ 



and a s is a new free parameter that we may use to minimize the condition number of D 5 
as it does not effect the Schur complement. Shamir's domain wall is the special case with 
b + = a 5 , 6_ = 0, u s = 1, and a s = 1. For Chiu's Zolotarev variant we set u s to be the roots 
of the Euclidean Cayley transform of the Zolotarev approximation to the sgn function. 



We now continue our derivation from equation (10) by cyclically shifting the columns of the 
right-handed part left, D 5 — > D 5 V, where 



V = 



/10000\ 
01000 
00100 
00010 

\0 1 J 



p_ 



/00001\ 
10000 
01000 
00100 

\0 1 0/ 
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which is a unitary transformation, — V 1 ■ This gives 
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Df 
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D V) ) 






/£>£> 











D (l) 










Df 
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D w 













V o 















If we define = Df P± + D K 1> P T , then 



W; 



so equation (11) becomes 














Q ( - ] C + \ 


















qL 3) 


Qf 














Q<_ 4) 


Qf 





V o 










QfcJ 



where C± = |(1 — /i) ± ±(1 + /x)7 5 = P± — /uP T as before. Scaling out the matrix 



(11) 



Q = 



M 1} o o 

Q [ l ] 
Qf 
Q 







(4) 



\ 






\ Qfj 
the domain wall operator reduces to the form introduced before 



/ 


1 















_p-l 

J 2 


1 













- 


_ T -1 
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c- J 
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where 



= - [D&P+ + D^P_] _1 [D^P_ + D^P+l 
= a s (b^\$ w -M) + l)P + + a s (b { l\$ w -M)-l)P_ 

a s (b { °\$ w — M) + l)P_ + a s {b { l\lp w - M) - 1)P + 



i -i 



(&W + - M) + ((&« - &<_ -) )(^ -M) + 2) 75 

(&« + &L s) )(# w — M) — ((6? - b^)(]p w — M) + 2) 75 



-(6+ + - M) + ((6+ - - M) + 2) 75 



-i 



x 



= -75 



-{b + + — M) — ((&+ - b-){lj) w — M) + 2) 75 



X 7 5 



(b + + b_)(U) w -M) 
2 + (b + -b_)($ w -M) 
(b + + b_)(Ip w -M) 



l5 + UJ s 



2 + (b + -b_)(0 w -M) 



M 



75 - UJ S 



75 



75 



uj s + H M ' 

H M being the Mobius kernel introduced in §4.4. 
Note that -0^4 = Jpjj 1 — a satisfies 

{$dw>7 5 } = {$^,75} - 2a 75 = ^{^nM^'n ~ 2^75 = 0, 

so for valence measurements the 'external' propagator -0^4 ls ver y convenient, as it satis- 
fies the off-shell anticommutation relation. However, for dynamical calculations this would 
violate the Nielsen-Ninomaya theorem, and indeed as shown by Pelisetto renormalisation 
induces unwanted ghost doublers that prevent the off-shell Ward identity from being pre- 
served, so we cannot use -0 DW for dynamical ('internal') propagators: we must use D N in 
the quantum action instead. 



4.9 Chiral Symmetry Breaking 

In order to measure chiral symmetry breaking, we define Ginsparg- Wilson defect 

75$ + .075 - 2a$-y 5 $ = 75 A L . 

For the approximate Neuberger operator aJj) = ±[l+ 7 5£(if )] the defect is aA^ = | [1 — e 2 (H)]. 
If we use a Zolotarev approximation for e that covers the spectrum of the kernel H with 
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a minimax error of A this shows that 23 \\Al\\ < A, which enables us to bound the matrix 
elements of Al that must be present in all (on-shell) chiral symmetry violating corrections. 
There is no such simple bound if the tanh approximation is used. 

The residual mass may be defined by 

(tr G^ A L G) 

771 = — 

(trGtC) ' 

where G is the quark propagator. For DWF this has been shown by Brower et al. to be 
exactly the usual domain wall residual mass. The residual mass m rcs is just one moment of 
A L . 

4.10 Numerical Studies 

Preliminary numerical studies have been carried out comparing the various five-dimensional 
on-shell chiral fermion formulations introduced in this lecture. [24] These were carried out 
on an ensemble of 15 gauge configurations from the RBRC two-flavour domain wall fermion 
dataset. The lattice size was V = 16 3 x 32, and the parameters used to generate the ensemble 
were n s = 12 (the size of fifth dimension), the DBW2 action with (3 = 0.8, a domain wall 
height of M = 1.8, and a fermion mass of fx = 0.02. 

The costs of inverting the various five-dimensional operators were compared as a function of 
the residual mass m rcs , and in Figure 16 we plot the results for the most promising actions. 
When comparing different valence kernels the 7r mass was matched. In order to compare 
like with like all the Dirac operators were even-odd preconditioned, and no projection of the 
subspace corresponding to small eigenvalues of the kernel was attempted. 

4.11 Future Work 

There are still a lot of things to investigate about algorithms for dynamical on-shell chiral 
fermions: 

• Five- dimensional or four- dimensional dynamics. Should we introduce five-dimensional 
pseudofermions as in §4.6, or only have four-dimensional pseudofermion fields with a 
five-dimensional linear equation solver? Five-dimensional pseudofermions would seem 
to introduce extra noise into the system for no obvious benefit, so the latter seems 
to be a more promising choice. The Pauli-Villars fields that need to be introduced 
to cancel the effects of the unwanted five-dimensional pseudofermions also introduce 
extra noise, unless they are generated from the same noise fields. 

23 Ignoring terms of 0(A 2 ). 
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Figure 16: Relative cost in units of Wilson-Dirac matrix applications as a function of the 
residual mass m rcs for some five-dimensional actions. Note the Zolotarev domain wall data 
point was only measured on a single configuration, and therefore is plotted without any error 
bars. The Zolotarev approximations were over an interval of 0.01 < \x\ < 1, which does not 
completely cover the spectrum of the kernel: if the approximation was chosen to cover the 
spectrum, or the eigenvectors corresponding to eigenvalues |A| < 0.01 were projected out 
and handled exactly, then we would expect m rcs to be much smaller, probably for little extra 
cost. 



• Multiple pseudofermion acceleration. Do multiple pseudofermion fields allow for a sig- 
nificant increase in step size, as discussed in the second lecture? This has been studied 
for DWF, where the answer is affirmative, but it would be interesting to investigate it 
for four-dimensional pseudofermion dynamics. 

• Five- dimensional multishift solver. If we want to apply the RHMC multiple pseudo- 
fermion method discussed in the second lecture to four-dimensional pseudofermion 
dynamics then we would like to be able to use a multishift solver to evaluate all the 
partial fractions in the same Krylov space. Unfortunately we have no idea how do this, 
as a constant shift of the four- dimensional system, which is the Schur complement 
of our five- dimensional systems, is not a constant shift of the five-dimensional ma- 
trix. This makes either a Hasenbusch-like scheme (§3.8.1) or a nested four- dimensional 
solver look more promising at the moment. 
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• Nested Krylov solvers. Instead of using a five- dimensional Schur complement approach 
as discuss in this lecture we could use a nested 'inner-outer' solver. For the inner solve 
we use a multishift technique to apply the partial fractions of a rational representation 
of the sgn function, for the outer we may use it to implement multiple pseudfermion 
acceleration (as above), or multiple valence masses if we are carrying out valence mea- 
surements. The disadvatage is that we keep building four-dimensional inner Krylov 
spaces and then discarding them, which seems wasteful. At present, for a single solve 
(as opposed to a multishift outer solve), the five-dimensional solvers may be 5-10 times 
faster. There have been some studies of how the residuals for the inner solver should 
be chosen to ensure the convergence of the outer solver. 

• Tunneling between different topological sectors. Whatever algorithm we use for dy- 
namical GW fermions it will suffer from severe critical slowing down as /j, — > 0. The 
potential barriers between different topological charge sectors grow as ji gets small, 
and the sectors decouple for \i = 0. It may be thought that for jj, = we would never 
want to tunnel from the zero topological charge sector to one of a different topology, 
but we must be careful as it is not manifestly obvious that the topological charge zero 
sector is connected. 

• Reflection/refraction algorithms have been introduced to alleviate the problems with 
topology change. As the problems have so far only been seen on tiny lattices, where 
the integration step size 5r may be large compared to the width of the barriers, it 
remains to be seen how necessary or effective these approaches are. 
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